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To Marinka, Gregor and Rebeka Hana 


I still think that the best way to learn a new idea is to see its history, to see why someone 
was forced to go through the painful and wonderful process of giving birth to a new idea. 
... Otherwise, it is impossible to guess how anyone could have discovered or invented it. 


— Gregory J. Chaitin, Meta Maths, The Quest for Omega 


Preface 


Context 


The paradoxes discovered in Cantor’s set theory sometime around 1900 began a 
crisis that shook the foundations of mathematics. In order to reconstruct mathemat- 
ics, freed from all paradoxes, Hilbert introduced a promising program with formal 
systems as the central idea. Though the program was unexpectedly brought to a 
close in 1931 by Gédel’s famous theorems, it bequeathed burning questions: “What 
is computing? What is computable? What is an algorithm? Can every problem be 
algorithmically solved?” This led to Computability Theory, which was born in the 
mid-1930s, when these questions were resolved by the seminal works of Church, 
Godel, Kleene, Post, and Turing. In addition to contributing to some of the greatest 
advances of twentieth-century mathematics, their ideas laid the foundations for the 
practical development of a universal computer in the 1940s as well as the discovery 
of a number of algorithmically unsolvable problems in different areas of science. 
New questions, such as “Are unsolvable problems equally difficult? If not, how can 
we compare their difficulty?” initiated new research topics of Computability Theory, 
which in turn delivered many important concepts and theorems. The application of 
these is central to the multidisciplinary research of Computability Theory. 


Aims 


Monographs in Theoretical Computer Science usually strive to present as much of 
the subject as possible. To achieve this, they present the subject in a definition— 
theorem-—proof style and, when appropriate, merge and intertwine different re- 
lated themes, such as computability, computational complexity, automata theory, 
and formal-language theory. This approach, however, often blurs historical circum- 
stances, reasons, and the motivation that led to important goals, concepts, methods, 
and theorems of the subject. 
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My aim is to compensate for this. Since the fundamental ideas of theoretical 
computer science were either motivated by historical circumstances in the field or 
developed by pure logical reasoning, I describe Computability Theory, a part of 
Theoretical Computer Science, from this point of view. Specifically, I describe the 
difficulties that arose in mathematical logic, the attempts to recover from them, and 
how these attempts led to the birth of Computability Theory and later influenced it. 
Although some of these attempts fell short of their primary goals, they put forward 
crucial questions about computation and led to the fundamental concepts of Com- 
putability Theory. These in turn logically led to still new questions, and so on. By 
describing this evolution I want to give the reader a deeper understanding of the 
foundations of this beautiful theory. The challenge in writing this book was there- 
fore to keep it accessible by describing the historical and logical development while 
at the same time introducing as many modern topics as needed to start the research. 
Thus, I will be happy if the book makes good reading before one tackles more ad- 
vanced literature on Computability Theory. 


Contents 


There are three parts in this book. 

Part I (Chaps. 1-4) Chapter 1 is introductory: it discusses the intuitive compre- 
hension of the concept of the algorithm. This comprehension was already provided 
by Euclid and sufficed since 300 B.C.E. or so. In the next three chapters I explain 
how the need for a rigorous, mathematical definition of the concepts of the algo- 
rithm, computation, and computability was born. Chapter 2 describes the events 
taking place in mathematics around 1900, when paradoxes were discovered. The 
circumstances that led to the paradoxes and consequently to the foundational crisis 
in mathematics are explained. The ideas of the three main schools of recovery— 
intuitionism, logicism, and formalism—that attempted to reconstruct mathematics 
are described. Chapter 3 delves into formalism. This school gathered the ideas and 
results of other schools in the concept of the formal axiomatic system. Three partic- 
ular systems that played crucial roles in events are described; these are the formal 
axiomatic systems of logic, arithmetic, and set theory. Chapter 4 presents Hilbert’s 
Program, a promising formalistic attempt that would use formal axiomatic systems 
to eliminate all the paradoxes from mathematics. It is explained how Hilbert’s Pro- 
gram was unexpectedly shattered by Gédel’s Incompleteness Theorems, which state, 
in effect, that not every truth can be proved (in a formal system). 

Part II (Chaps. 5-9) Hilbert’s Program left open a question about the existence 
of a particular algorithm, the algorithm that would solve the Entscheidungsproblem. 
Since this algorithm might not exist, it was necessary to formalize the concept of 
the algorithm—only then would a proof of the non-existence of the algorithm be 
possible. Therefore, Chapter 5 discusses the fundamental questions: “What is an 
algorithm? What is computation? What does it mean when we say that a function 
or problem is computable?” It is explained how these intuitive, informal concepts 
were formally defined in the form of the Computability (Church—Turing) Thesis by 
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different yet equivalent models of computation, such as [-recursive functions and 
(general) recursive functions, A-calculus, the Turing machine, the Post machine, 
and Markov algorithms. Chapter 6 focuses on the Turing machine, which most 
convincingly formalized the intuitive concepts of computation. Several equivalent 
variants of the Turing machine are described. Three basic uses of the Turing ma- 
chine are presented: function computation, set generation, and set recognition. The 
existence of the universal Turing machine is proved and its impact on the creation 
and development of general-purpose computers is described. The equivalence of 
the Turing machine and the RAM model of computation is proved. In Chapter 7, 
the first basic yet important theorems are deduced. These include the relations be- 
tween decidable, semi-decidable, and undecidable sets (i.e., decision problems), the 
Padding Lemma, the Parameter (i.e., s-m-n) Theorem, and the Recursion Theorem. 
The latter two are also discussed in view of the recursive procedure calls in the mod- 
ern general-purpose computer. Chapter 8 is devoted to incomputability. It uncovers 
a surprising fact that, in effect, not everything that is defined can be computed (on 
a usual model of computation). Specifically, the chapter shows that not every com- 
putational problem can be solved by a computer. First, the incomputability of the 
Halting Problem is proved. To show that this is not just a unique event, a list of 
selected incomputable problems from various fields of science is given. Then, in 
Chapter 9, methods of proving the incomputability of problems are explained; in 
particular, proving methods by diagonalization, reduction, the Recursion Theorem, 
and Rice’s Theorem are explained. 

Part III (Chaps. 10-15) In this part attention turns to relative computability. 
I tried to keep the chapters “bite-sized” by focusing in each on a single issue only. 
Chapter 10 introduces the concepts of the oracle and the oracle Turing machine, 
describes how computation with such an external help would run, and discusses 
how oracles could be replaced in the real world by actual databases or networks of 
computers. Chapter 11 formalizes the intuitive notion of the “degree of unsolvabil- 
ity” of a problem. To do this, it first introduces the concept of Turing reduction, the 
most general reduction between computational problems, and then the concept of 
Turing degree, which formalizes the notion of the degree of unsolvability. This for- 
malization makes it possible to define, in Chapter 12, an operator called the Turing 
jump and, by applying it, to construct a hierarchy of infinitely many Turing degrees. 
Thus, a surprising fact is discovered that for every unsolvable problem there is a 
more difficult unsolvable problem; there is no most difficult unsolvable problem. 
Chapter 13 expands on this intriguing fact. It first introduces a view of the class of 
all Turing degrees as a mathematical structure. This eases expression of relations 
between the degrees. Then several properties of this class are proved, revealing the 
highly complex structure of the class. Chapter 14 introduces computably enumer- 
able (c.e.) Turing degrees. It then presents Post’s Problem, posing whether there 
exist c.e. degrees other than 0 below the degree 0’. Then the priority method, dis- 
covered and used by Friedberg and Muchnik to solve Post’s Problem, is described. 
Chapter 15 introduces the arithmetical hierarchy, which gives another, arithmetical 
view of the degrees of unsolvability. Finally, Chapter 16 lists some suggestions for 
further reading. 
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Approach 


The main lines of the approach are: 


e Presentation levels. 1 use two levels of presentation, the fast track and detours. 
The fast track is a fil rouge through the book and gives a bird’s-eye view of Com- 
putability Theory. It can be read independently of the detours. These contain 
detailed proofs, more demanding themes, additional historical facts, and further 
details, all of which can safely be skipped while reading on the fast track. The 
two levels differ visually: detours are written in small font and are put into Boxes 
(between gray bars, with broken lower bar), so they can easily be skipped or 
skimmed on first reading. Proofs are given on both levels whenever they are dif- 
ficult or long. 


e Clarity. Whenever possible I give the motivation and an explanation of the cir- 
cumstances that led to new goals, concepts, methods, or theorems. For example, I 
explicitly point out with NB (nota bene) marks those situations and achievements 
that had important impact on further development in the field. Sometimes NB 
marks introduce conventions that are used in the rest of the book. New notions 
are introduced when they are naturally needed. Although I rigorously deduce the- 
orems, I try to make proofs as intelligible as possible; this I do by commenting 
on tricky inferences and avoiding excessive formalism. I give intuitive, informal 
explanations of the concepts, methods, and theorems. Figures are given when- 
ever this can add to the clarity of the text. 


e Contemporary terminology. 1 use the recently suggested terminology and de- 
scribe the reasons for it in the Bibliographic Notes; thus, I use partial computable 
(p.c.) functions (instead of partial recursive (p.r.) functions); computable func- 
tions (instead of recursive functions); computably enumerable (c.e.) sets (instead 
of recursively enumerable (r.e.) sets); and computable sets (instead of recursive 
sets). 


e Historical account. 1 give an extended historical account of the mathematical 
and logical roots of Computability Theory. 


e Turing machine. After describing different competing models of computation, I 
adopt the Turing machine as the model of computation and build on it. I neither 
formally prove the equivalence of these models, nor do I teach how to program 
Turing machines; I believe that all of this would take excessive space and add 
little to the understanding of Computability Theory. 1 do, however, rigorously 
prove the equivalence of the Turing machine and the RAM model, as the latter 
so closely abstracts real-life, general-purpose computers. 


e Unrestricted computing resources. | decouple Automata Theory and Formal- 
Language Theory from Computability Theory. This enables me to consider gen- 
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eral models of computation (i.e., models with unlimited resources) and hence 
focus freely on the question “What can be computed?” In this way, I believe, 
Computability Theory can be seen more clearly and it can serve as a natural basis 
for the development of Computational Complexity Theory in its study of “What 
can be computed efficiently?” Although I don’t delve into Computational Com- 
plexity Theory, I do indicate the points where Computational Complexity Theory 
would take over. 


e Shortcuts to relative computability. 1 introduce oracles in the usual way, after 
explaining classical computability. Readers eager to enter relative computability 
might want to start with Part II and continue on the fast track. 


e Practical consequences and applications. 1 describe the applications of concepts 
and theorems, whenever I am aware of them. 


Finally, in describing Computability Theory 1 do not try to be comprehensive. 
Rather, I view the book as a first step towards more advanced texts on Computability 
Theory, or as an introductory text to Computational Complexity Theory. 


Audience 


This book is written at a level appropriate for undergraduate or beginning graduate 
students in computer science or mathematics. It can also be used by anyone pursuing 
research at the intersection of theoretical computer science on the one hand and 
physics, biology, linguistics, or analytic philosophy on the other. 

The only necessary prerequisite is some exposure to elementary logic. However, 
it would be helpful if the reader has had undergraduate-level courses in set theory 
and introductory modern algebra. All that is needed for the book is presented in 
Appendix A, which the reader can use to fill in the gaps in his or her knowledge. 


Teaching 


There are several courses one can teach from this book. A course offering the 
minimum of Computability Theory might cover (omitting boxes) Chaps. 5, 6, 7; 
Sects. 8.1, 8.2, 8.4; and Chap. 9. Such a course might be continued with a course on 
Complexity Theory. An introductory course on Computability Theory might cover 
Parts I and I (omitting most boxes of Part I). A beginning graduate-level course on 
Computability Theory might cover all three parts (with all the details in boxes). A 
course offering a shortcut (some 60 pages) to Relative Computability (Chaps. 10 
to 15) might cover Sect. 5.3; Sects. 6.1.1, 6.2.1, 6.2.2; Sects. 6.3, 7.1, 7.2, 7.3; 
Sects. 7.4.1, 7.4.2, 7.4.3; Sects. 8.1, 8.2, 9.1, 9.2; and then Chaps. 10 through 15. 
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PowerPoint slides covering all three parts of the text are maintained and available at: 


http: / /lalg. fri. und=17.627 fcr 


Origin 


This book grew out of two activities: (1) the courses in Computability and Compu- 
tational Complexity Theory that I have been teaching at the University of Ljubljana, 
and (2) my intention to write a textbook for a course on algorithms that I also teach. 

When I started working on (2) I wanted to explain the O-notation in a satisfactory 
way, so I planned an introductory chapter that would cover the basics of Computa- 
tional Complexity Theory. But to explain the latter in a satisfactory way, the basics 
of Computability Theory had to be given first. So, I started writing on computability. 
But the story repeated once again and I found myself describing the Mathematical 
Logic of the twentieth century. This regression was due to (i) my awareness that, 
in the development of mathematical sciences, there was always some reason for in- 
troducing a new notion, concept, method, or goal, and (ii) my belief that the text 
should describe such reasons in order to present the subject as clearly as possible. 
Of course, many historical events and logical facts were important in this respect, 
so the chapter on Computability Theory continued to grow. 

At the same time, I was aware that students of Computability and Computational 
Complexity Theory often have difficulty in grasping the meaning and importance of 
certain themes, as well as in linking up the concepts and theorems as a whole. It 
was obvious that before a new concept, method, or goal was introduced, the student 
should be given a historical or purely logical motivation for such a step. In addition, 
giving a bird’s-eye view of the theory developed up to the last milestone also proved 
to be extremely advantageous. 

These observations coincided with my wishes about the chapter on Computabil- 
ity Theory. So the project continued in this direction until the “chapter” grew into a 
text on The Foundations of Computability Theory, which is in front of you. 
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Preface to the Second Edition 


This is a completely revised edition, with about ninety pages of new material. 
In particular: 


1. To improve the clarity of exposition, some terminological inconsistencies and 
notational redundancies have been removed. Thus all partial computable func- 
tions are now uniformly denoted by (possibly indexed) @, instead of using y for 
the functions induced by particular Turing machines. 


2. Various kinds of typos, minor errors, and aesthetic as well as grammatical flows 
have been corrected. 


3. An entirely new Section 3.1.2 on The Notion of Truth has been added in 
Chapter 3. The section describes Alfred Tarski’s definition of the notion of truth 
in formal languages and his attempts to formulate a similar definition for natural 
languages. Since the new section represents a natural bridge between the notion 
of the formal axiomatic system and the notion of its model, it has been inserted 
between the old sections on formal axiomatic systems and their interpretations. 


4. Another major change is in Chapter 5, in Section 5.2.3 on models of computation, 
where the discussion of the Post Machine has been completely rewritten. 


5. To comply with the up-to-date terminology, the recursive functions (as defined 
by Godel and Kleene) have been renamed to L-recursive functions. In this way, 
general recursive functions (as defined by Gédel and Herbrand) can simply be 
called recursive functions. 


6. An entirely new Chapter 16 Computability (Church-Turing) Thesis Revisited 
has been added. The chapter is a systematic and detailed account of the origins, 
evolution, and meaning of this thesis. 


7. Accordingly, some sections with Bibliographic Notes have been augmented. 

8. Some sections containing Problems have been extended with new problems. 
Where required, definitions introducing the key notions and comments on these 
notions have been added. 

9. A Glossary relating to computability theory has been added to help the reader. 

10. Finally, References have been expanded by ninety new bibliographic entries. 
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Part I 
THE ROOTS OF COMPUTABILITY 
THEORY 


In this part we will describe the events taking place in mathematics at the beginning 
of the twentieth century. Paradoxes discovered in Cantor’s set theory around 1900 
started a crisis in the foundations of mathematics. To reconstruct mathematics free 
from all paradoxes, Hilbert introduced a promising program with formal systems 
as the central idea. Though an end was unexpectedly put to the program in 1931 
by Gédel’s famous theorems, it bequeathed burning questions: What is computing? 
What is computable? What is an algorithm? Can every problem be solved algorith- 
mically? These were the questions that led to the birth of Computability Theory. 


ye 
Chapter 1 Ritiem 
Introduction 


A recipe is a set of instructions describing how to prepare 
something. A culinary recipe for a dish consists of the required 
ingredients and their quantities, equipment and environment 
needed to prepare the dish, an ordered list of preparation steps, 
and the texture and flavor of the dish. 


Abstract The central notions in this book are those of the algorithm and computa- 
tion, not a particular algorithm for a particular problem or a particular computation, 
but the algorithm and computation in general. The first algorithms were discovered 
by the ancient Greeks. Faced with a specific problem, they asked for a set of instruc- 
tions whose execution in the prescribed order would eventually provide the solution 
to the problem. This view of the algorithm sufficed since the fourth century B.C.E., 
which meant there was no need to ask questions about algorithms and computation 
in general. 


1.1 Algorithms and Computation 


In this section we will describe how the concept of the algorithm was traditionally 
intuitively understood. We will briefly review the historical landmarks connected 
with the concept of the algorithm up to the beginning of the twentieth century. 


1.1.1 The Intuitive Concept of the Algorithm and Computation 


Every computational problem is associated with two sets: a set A, which consists 
of all the possible input data to the problem, and a set B, which consists of all the 
possible solutions. For example, consider the problem 


“Find the greatest common divisor of two positive natural numbers.” 


After we have chosen the input data (e.g., 420 and 252), the solution to the problem 
is defined (84). Thus, we can think of the problem as a function f : A > B, which 
maps the input data to the corresponding solution to the problem (see Fig. 1.1). 
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Fig. 1.1 A problem is viewed 
as a function mapping the in- problem 
put data to the corresponding 
solution 


What does f look like? How do we compute its value f(x) for a given x? The 
way we do this is described and governed by the associated algorithm (see Fig. 1.2). 


v 


algorithm 
Fig. 1.2 The algorithm di- 


rects the processor in order to 
compute the solution to the aD processor ++ B») 
problem 


Definition 1.1. (Algorithm Intuitively) An algorithm for solving a problem is 
a finite set of instructions that lead the processor, in a finite number of steps, 
from the input data of the problem to the corresponding solution. 


This was an informal definition of the algorithm. As such, it may raise questions, 
sO we give some additional explanations. An algorithm is a recipe, a finite list of 
instructions that tell how to solve a problem. The processor of an algorithm may be 
a human, or a mechanical, electronic, or any other device, capable of mechanically 
following, interpreting, and executing instructions, while using no self-reflection or 
external help. The instructions must be simple enough so that the processor can 
execute them, and they have to be unambiguous, so that every next step of the ex- 
ecution is precisely defined. A computation is a sequence of such steps. The input 
data must be reasonable, in the sense that they are associated with solutions, so that 
the algorithm can bring the processor to a solution. 

Generally, there exist many algorithms for solving a computational problem. 
However, they differ because they are based on different ideas or different design 
methods. 
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Example 1.1. (Euclid’s Algorithm) An algorithm for finding the greatest common divisor of two 
numbers was described around 300 B.C.E. by Euclid! in his work Elements. The algorithm is: 


Divide the larger number by the other. 
Then keep dividing the last divisor by the last remainder—unless the last remainder is 0. 
In that case the solution is the last nonzero remainder.” 


For the input data 420 and 252, the computation is: 


420 : 252 = 1 +remainder 168; 
252 : 168 = 1+remainder 84; 
168:84 = 2+remainder 0; and the solution is 84. 


Fig. 1.3 Euclid 
(Courtesy: See Preface) 


Observe that there exists another algorithm for this problem that is based on a different idea: 


Factorize both numbers 
and multiply the common factors. 
The solution is this product. 


For the input data 420 and 252, the computation is: 
420 = 2?%3x5x7 


252 = 22x32 x 7 
2?x3 x 7 = 84; and the solution is 84. 


' Buclid, 325-265 B.C.E., Greek mathematician, lived in Alexandria, now in Egypt. 


? This algorithm was probably not discovered by Euclid. The algorithm was probably known by 
Eudoxus of Cnidus (408-355 B.C.E.), but it may even pre-date Eudoxus. 
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1.1.2 Algorithms and Computations Before the Twentieth Century 


Euclid’s algorithm is one of the first known nontrivial algorithms designed by a 
human. Unfortunately, the clumsy Roman number system, which was used in the 
ancient Mediterranean world, hindered computation with large numbers and, conse- 
quently, the application of such algorithms. Only after the positional decimal num- 
ber system was discovered between the first and fourth centuries in India could 
large numbers be written succinctly. This enabled the Persian mathematician al- 
Khwarizmi? to describe in the year 825 algorithms for computing with such num- 
bers, and in 830 algorithms for solving linear and quadratic equations. His name is 
the origin of the word algorithm. 


Fig. 1.4 al-Khwarizmi 
(Courtesy: See Preface) 


In the seventeenth century, the first attempts were made to mechanize the algo- 
rithmic solving of particular computational problems of interest. In 1623, Schickard* 
tried to construct a machine capable of executing the operations + and — on nat- 
ural numbers. Ten years later, a similar machine was successfully constructed by 
Pascal.> But Leibniz® saw further into the future. From 1666 he was considering a 
universal language (Latin lingua characteristica universalis) that would be capable 
of describing any notion from mathematics or the other sciences. His intention was 
to associate basic notions with natural numbers in such a way that the application 
of arithmetic operations on these numbers would return more complex numbers that 
would represent new, more complex notions. Leibniz also considered a universal 
computing machine (Latin calculus ratiocinator) capable of computing with such 
numbers. In 1671 he even constructed a machine that was able to carry out the opera- 
tions +,—, x,+. In short, Leibniz’s intention was to replace certain forms of human 
reflection (such as thinking, inferring, proving) with mechanical and mechanized 
arithmetic. 


3 Muhammad ibn Musa al-Khwarizmi, 780-850, Persian mathematician, astronomer, geographer; 
lived in Baghdad. 


4 Wilhelm Schickard, 1592-1635, German astronomer and mathematician. 
> Blaise Pascal, 1623-1662, French mathematician, physicist, and philosopher. 
6 Gottfried Wilhelm Leibniz, 1646-1716, German philosopher and mathematician. 
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In 1834, a century and a half later, new and brilliant ideas led Babbage’ to his 
design of a new, conceptually different computing machine. This machine, called 
the analytical engine, would be capable of executing arbitrary programs (i.e., algo- 
rithms written in the appropriate way) and hence of solving arbitrary computational 
problems. This led Babbage to believe that every computational problem is mechan- 
ically, and hence algorithmically, solvable. 


Fig. 1.5 Blaise Pascal Fig. 1.6 Gottfried Leibniz Fig. 1.7 Charles Babbage 
(Courtesy: See Preface) (Courtesy: See Preface) (Courtesy: See Preface) 


Nowadays, Leibniz’s and Babbage’s ideas remind us of several concepts of con- 
temporary Computability Theory that we will look at in the following chapters. 
Such concepts are Gédel’s arithmetization of formal axiomatic systems, the univer- 
sal computing machine, computability, and the Computability Thesis. So Leibniz’s 
and Babbage’s ideas were much before their time. But they were too early to have 
any practical impact on mankind’s comprehension of computation. 

As a result, the concept of the algorithm remained firmly at the intuitive level, 
defined only by common sense as in Definition 1.1. It took many events for the 
concept of the algorithm to be rigorously defined and, as a result, for Computability 
Theory to be born. These events are described in the next chapter. 


1.2 Chapter Summary 


The algorithm was traditionally intuitively understood as a recipe, i.e., a finite list 
of directives written in some language that tells us how to solve a problem mechan- 
ically. In other words, the algorithm is a precisely described routine procedure that 
can be applied and systematically followed through to a solution of a problem. Be- 
cause there was no need to define formally the concept of the algorithm, it remained 
firmly at the intuitive, informal level. 


7 Charles Babbage, 1791-1871, British mathematician. 


ye 
Chapter 2 Serato 
The Foundational Crisis of Mathematics 


A paradox is a situation that involves two or more facts or 
qualities that contradict each other. 


Abstract The need for a formal definition of the concept of algorithm was made 
clear during the first few decades of the twentieth century as a result of events tak- 
ing place in mathematics. At the beginning of the century, Cantor’s naive set theory 
was born. This theory was very promising because it offered a common foundation 
to all the fields of mathematics. However, it treated infinity incautiously and boldly. 
This called for a response, which soon came in the form of logical paradoxes. Be- 
cause Cantor’s set theory was unable to eliminate them—or at least bring them un- 
der control—formal logic was engaged. As a result, three schools of mathematical 
thought—intuitionism, logicism, and formalism—contributed important ideas and 
tools that enabled an exact and concise mathematical expression and brought rigor 
to mathematical research. 


2.1 Crisis in Set Theory 


In this section we will describe the axiomatic method that was used to develop 
mathematics since its beginnings. We will also describe how Cantor applied the 
axiomatic method to develop his set theory. Finally, we will explain how paradoxes 
revealed themselves in this theory. 


2.1.1 Axiomatic Systems 


The basic method used to acquire new knowledge in mathematics and similar disci- 
plines is the axiomatic method. Euclid was probably the first to use it when he was 
developing his geometry. 
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Axiomatic Method 


When using the axiomatic method, we start our treatment of the field of interest by 
carefully selecting a few basic notions and making a few basic statements,! called 
axioms (see Fig. 2.1). An axiom is a statement that asserts either that a basic notion 
has a certain property, or that a certain relation holds between certain basic notions. 
We do not try to define the basic notions, nor do we try to prove the axioms. The 
basic notions and axioms form our initial theory of the field. 

We then start developing the theory, i.e., extending the initial knowledge of the 
field. We do this systematically. This means that we must define every new no- 
tion in a clear and precise way, using only basic or previously defined notions. It 
also means that we must try to prove every new proposition,” i.e., deduce? it only 
from axioms or previously proven statements. Here, a proof is a finite sequence of 
mental steps, i.e., inferences,* that end in the realization that the proposition is a 
logical consequence of the axioms and previously proven statements. A provable 
proposition is called a theorem of the theory. The process of proving is informal, 
content-dependent in the sense that each conclusion must undoubtedly follow from 
the meaning of its premises. 

Informally, the development of a theory is a process of discovering (i.e., de- 
ducing) new theorems—in Nagel and Newman’s words, as Columbus discovered 
America—and defining new notions in order to facilitate this process. (This is the 
Platonic view of mathematics; see Box 2.1.) We say that axioms and basic notions 
constitute an axiomatic system. 


Fig. 2.1 A theory has axioms, ‘,unprovable statements ; 

theorems, and unprovable ‘ ; 

statements theory 
theorems 


Caxioms) 


Box 2.1 (Platonism). 


According to the Platonic view mathematics does not create new objects but, instead, discovers 
already existing objects. These exist in the non-material world of abstract Ideas, which is accessible 
only to our intellect. For example, the idea of the number 2 exists per se, capturing the state of 
“twoness”, 1.e., the state of any gathering of anything and something else—and nothing else. In the 


| A statement is something that we say or write that makes sense and is either true or false. 

2 A proposition is a statement for which a proof is either required or provided. 

3 A deduction is the process of reaching a conclusion about something because of other things 
(called premises) that we know to be true. 

4 An inference is a conclusion that we draw about something by using information that we already 
have about it. It is also the process of coming to a conclusion. 
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material world, /deas present themselves in terms of imperfect copies. For example, the Idea of 
a triangle is presented by various copies, such as the figures A, V,<J,>, and love triangles too. It 
can take considerable instinct to discover an Idea, the comprehension of which is consistent with 
the sensation of its copies in the material world. The agreement of these two is the criterion for 
deciding whether the /dea really exists. 


Evident Axiomatic Systems 


From the time of Euclid to the mid-nineteenth century it was required that axioms 
be statements that are in perfect agreement with human experience in the particular 
field of interest. The validity of such axioms was beyond any doubt, because they 
were clearly confirmed by reality. Thus, no proofs of axioms were required. Such 
axioms are evident. Euclidean elementary geometry is an example of an evident 
axiomatic system, because it talks of points, lines, and planes, which are evident 
idealizations of the corresponding real-world objects. 

However, in the nineteenth century serious doubts arose as to whether evident ax- 
iomatic systems are always appropriate. This is because it was found that experience 
and intuition may be misleading. (For an example of such a situation see Box 2.2.) 
This led to the concept of the hypothetical axiomatic system. 


Box 2.2 (Euclid’s Fifth Axiom). 


In two-dimensional geometry a line parallel to a given line L is a line that does not intersect with L. 
Euclid’s fifth axiom, also called the Parallel Postulate, states that at most one parallel can be drawn 
through any point not on L. (In fact, Euclid postulated this axiom in a different but equivalent form.) 

To Euclid and other ancients the fifth axiom seemed less obvious than the other four of Euclid’s 
axioms. This is because a parallel line can be viewed as a line segment that never intersects with L, 
even if it is extended indefinitely. The fifth axiom thus implicitly speaks about a certain occurrence 
in arbitrarily removed regions of the plane, that is, that the segment and L will never meet. However, 
since Aristotle the ancients were well aware that one has to be careful when dealing with infinity. 
For example, they were already familiar with the notion of asymptote, a line that approaches a 
given curve but meets the curve only at infinity. 

To avoid the vagueness and controversy of Euclid’s fifth axiom, they undertook to deduce it 
from Euclid’s other four axioms; these caused no disputes. However, all attempts were unsuccess- 
ful until 1868, when Beltrami? proved that Euclid’s fifth axiom cannot be deduced from the other 
four axioms. In other words, Euclid’s fifth axiom is independent of Euclid’s other four axioms. 


NB The importance of Beltrami’s discovery is that it does not belong to geometry, but rather to the 
science about geometry, and, more generally, to metamathematics, the science about mathematics. 
About fifty years later, metamathematics would come to the fore more explicitly and play a key role 
in the events that led to a rigorous definition of the notion of the algorithm. 


Since the eleventh century, Persian and Italian mathematicians had tried to prove Euclid’s fifth 
axiom indirectly. They tried to refute all of its alternatives. These stated either that there are no 
parallels, or that there are several different parallels through a given point. When they considered 


5 Eugenio Beltrami, 1835-1900, Italian mathematician. 


12 2 The Foundational Crisis of Mathematics 


these alternatives they unknowingly discovered non-Euclidean geometries, such as elliptic and 
hyperbolic geometry. But they cast them away as having no evidentness in reality. According to 
the usual experience they viewed reality as a space where only Euclidean geometry can rule. 

In the nineteenth century, Lobachevsky,® Bolyai,’ and Riemann® thought of these geometries 
as true alternatives. They showed that if Euclid’s fifth axiom is replaced by a different axiom, then 
a different non-Euclidean geometry is obtained. In addition, there exist in reality examples, also 
called models, of such non-Euclidean geometries. For instance, Riemann replaced Euclid’s fifth 
axiom with the axiom that states that there is no parallel to a given line L through a given point. 
The resulting geometry is called elliptic and is modeled by a sphere. In contrast to this, Bolyai and 
Lobachevsky selected the axiom that allows several parallels to L to pass through a given point. 
The resulting hyperbolic geometry holds, for example, on the surface of a saddle. 


NB These discoveries shook the traditional standpoint that axioms should be obvious and clearly 
agree with reality. It became clear that intuition and experience may be misleading. 


Hypothetical Axiomatic Systems 


After the realization that instinct and experience can be delusive, mathematics grad- 
ually took a more abstract view of its research subjects. No longer was it interested 
in the (potentially slippery) nature of the basic notions used in an axiomatic system. 
For example, arithmetic was no longer concerned with the question of what a natural 
number really is, and geometry was no longer interested in what a point, a line, and 
a plane really are. Instead, mathematics focused on the properties of and relations 
between the basic notions, which could be defined without specifying what the basic 
notions are in reality. A basic notion can be any object that fulfills all the conditions 
given by the axioms. 

Thus the role of the axioms has changed; now an axiom is only a hypothesis, a 
speculative statement about the basic notions taken to hold, although nothing is said 
about the true nature and existence of such basic notions. Such an axiomatic system 
is called hypothetical. For example, by using nine axioms Peano in 1889 described 
properties and relations typical of natural numbers without explicitly defining a nat- 
ural number. Similarly, in 1899 Hilbert developed elementary geometry, where no 
explicit definition of a point, line, and plane is given; instead, these are defined im- 
plicitly, only as possible objects that satisfy the postulated axioms. 

Because the nature of basic notions lost its importance, also the requirement for 
the evidentness of axioms as well as their verifiability in reality was abandoned. The 
obvious link between the subject of mathematical treatment and reality vanished. 
Instead of axiomatic evidentness the fertility of axioms came to the fore, i.e., the 
number of theorems deduced, their expressiveness, and their influence. The reason- 
ableness and applicability of the theory developed was evaluated by the importance 
of successful interpretations, i.e., applications of the theory to various domains of 
reality. Depending on this, the theory was either accepted, corrected, or cast off. 


6 Nikolai Ivanovich Lobachevsky, 1792-1856, Russian mathematician and geometer. 
7 Janos Bolyai, 1802-1860, Hungarian mathematician. 
8 Georg Friedrich Bernhard Riemann, 1826—1866, German mathematician. 
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NB This freedom, which arose from the hypothetical axiomatic system, enabled 
scientists to make attempts that eventually bred important new areas of mathematics. 
Set theory is such an example.? 


2.1.2 Cantor’s Naive Set Theory 


A theory with a hypothetical axiomatic system that will play a special role in what 
follows was the naive set theory founded by Cantor.!° Let us take a quick look at 
this theory. 


Fig. 2.2, Georg Cantor 
(Courtesy: See Preface) 


Basic Notions, Concepts, and Axioms 


In 1895 Cantor defined the concept of a set as follows. 


Definition 2.1. (Cantor’s Set) A set is any collection of definite, distinguishable 
objects of our intuition or of our intellect to be conceived as a whole (i.e., 
regarded as a single unity). 


Thus, an object can be any thing or any notion, such as a number, a pizza, or even 
another set. If an object x is in a set S, we say that x is a member of S and write 
x € S. When x is not in S, it is not a member of S, so x ¢ S. Given an object x and 
a set S, either x € S or x ¢ S—there is no third choice. This is known as the Law of 
Excluded Middle." 


Cantor did not develop his theory from explicitly written axioms. However, later 
analyses of his work revealed that he used three principles in the same fashion as 
axioms. For this reason we call these principles the Axioms of Extensionality, 
Abstraction, and Choice. Let us describe them. 


9 Another example of a theory with a hypothetical axiomatic system is group theory. 
10 Georg Cantor, 1845-1918, German mathematician. 


'l The law states that for any logical statement, either that statement is true, or its negation is— 
there is no third possibility (Latin tertium non datur). 
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Axiom 2.1 (Extensionality). A set is completely determined by its members. 


Thus a set is completely described if we list all of its members (by convention be- 
tween braces “{” and “}”). For instance, {o,<,0} is a set whose members are ©, <, 0, 
while one of the three members of the set {o, {<,>},0} is itself a set. When a set has 
many members, say a thousand, it may be impractical to list all of them; instead, we 
may describe the set perfectly by stating the characteristic property of its members. 
Thus a set of objects with the property P is written as {x | x has the property P} or 
as {x | P(x)}. For instance, {x | x is a natural number A 1 < x < 1,000}. 


What property can P be? Cantor’s liberal-minded standpoint in this matter is 
summed up in the second axiom: 


Axiom 2.2 (Abstraction). Every property determines a set. 


If there is no object with a given property, the set is empty, that is, {}. Due to the 
Axiom of Extensionality there is only one empty set; we denote it by 0. 


Cantor’s third principle is summed up in the third axiom: 


Axiom 2.3 (Choice). Given any set F of nonempty pairwise disjoint sets, there is a 
set that contains exactly one member of each set in F. 


We see that the set and the membership relation € are such basic notions that 
Cantor defined them informally, in a descriptive way. Having done this he used 
them to rigorously define other notions in a true axiomatic manner. For example, he 
defined the relations = and C on sets. Specifically, two sets A and B are equal (i.e., 
A = B) if they have the same objects as members. A set A is a subset of a set B 
(i.e., A C B) if every member of A is also a member of 6. Cantor also defined the 
operations ~,U,,—,2' that construct new sets from existing ones. For example, if 
A and BG are two sets, then also the complement A, the union AUB, the intersection 
An B, the difference A—B, and the power set 2“ are sets. 


Applications 


Cantor’s set theory very quickly found applications in different fields of mathemat- 
ics. For example, Kuratowski!? used sets to define the ordered pair (x,y), ie., a set 
of two elements with one being the first and the other the second in some order. 
The definition is (x,y) = {{x}, {x,y}}. (The ordering of {x} and {x,y} is implic- 
itly imposed by the relation C, since {x} C {x,y} but not vice versa.) Two ordered 
pairs are equal if they have equal first elements and equal second elements. Now 
the Cartesian product A x B could be defined as the set of all ordered pairs (a,b), 
where a € A and b € B. The sets A and B need not be distinct. In this case, A? was 
used to denote A x A and, in general, A” = A”~! x A, where A! =A. 


!2 Kazimierz Kuratowski, 1896-1980, Polish mathematician and logician. 
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Many other important notions and concepts that were in common use although 
informally defined were at last rigorously defined in terms of set theory, e.g., the 
concepts of function and natural number. For example, a function f: A> Bisa 
set of ordered pairs (a,b), where a € A and b = f(a) € B, and there are no two 
ordered pairs with equal first components and different second components. Based 
on this, set-theoretic definitions of injective, surjective, and bijective functions were 
easily made. For example, a bijective function is a function f :. A > B whose set 
of ordered pairs contains, for each b € B, at least one ordered pair with the second 
component b (surjectivity), and there are no two ordered pairs having different first 
components and equal second components (injectivity). 

Von Neumann used sets to construct natural numbers as follows. Consider the 
number 2. We may imagine that it represents the state of “twoness”’, i.e., the gather- 
ing of one element and one more different element—and nothing else. Since the set 
{0,1} is an example of such a gathering, we may define 2 = {0, 1}. Similarly, if we 
imagine 3 to represent “threeness”, we may define 3 = {0, 1,2}. Continuing in this 
way, we atrive at the general definition n = {0,1,2,...,2—1}. So a natural number 
can be defined as the set of all of its predecessors. What about the number 0? Since 
0 has no natural predecessors, the corresponding set is empty. Hence the definition 
0 = 0. We now see that natural numbers can be constructed from 0 as follows: 0 <0; 
1= {0}; 2= {0,{0}}; 3= {0, {0}, {0, {0}}}; ...; n +1 SnU {n}; ... Based on 
this, other definitions and constructions followed (e.g., of rational and real numbers). 


NB Cantor’s set theory offered a simple and unified approach to all fields of math- 
ematics. As such it promised to become the foundation of all mathematics. 


But Cantor’s set theory also brought new, quite surprising discoveries about the 
so-called cardinal and ordinal numbers. As we will see, these discoveries resulted 
from Cantor’s Axiom of Abstraction and his view of infinity. Let us go into details. 


Cardinal Numbers 


Intuitively, two sets have the same “size” if they contain the same number of ele- 
ments. Without any counting of their members we can assert that two sets are 
equinumerous, i.e., of the same “size’’, if there is a bijective function mapping one 
set onto the other. This function pairs every member of one set with exactly one 
member of the other set, and vice versa. Such sets are said to have the same cardi- 
nality. For example, the sets {o,<,0} and {a,b,c} have the same cardinality because 
{(°,a), (<,b), (0,c)} is a bijective function. In this example, each of the sets has car- 
dinality (“size”) 3, a natural number. We denote the cardinality of a set S by |S]. 


Is the cardinality always a natural number? Cantor’s Axiom of Abstraction guar- 
antees that the set Sp = {x | P(x)} exists for any given property P. Hence, it ex- 
ists also for a P that is shared by infinitely many objects. For example, if we put 
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=, ees 


= “is natural number,’ we get the set of a// natural numbers. This set is not only 
an interesting and useful mathematical object, but (according to Cantor) it also ex- 
ists as a perfectly defined and accomplished unity. Usually, it is denoted by N. It is 
obvious that the cardinality of N cannot be a natural number because any such num- 
ber would be too small. Thus Cantor was forced to introduce a new kind of number 
and designate it with some new symbol not used for natural numbers. He denoted 
this number by No (read aleph zero |). Cardinality of sets can thus be described by 
the numbers that Cantor called cardinal numbers. A cardinal number (or cardinal 
for short) can either be finite (in that case it is natural) or transfinite, depending on 
whether it measures the size of a finite or infinite set. For example, No is a transfinite 
cardinal that describes the size of the set N as well as the size of any other infinite 
set whose members can all be listed in a sequence. 

Does every infinite set have the cardinality No? Cantor discovered that this is 
not so. He proved (see Box 2.3) that the cardinality of a set S is strictly less than 
the cardinality of its power set 25—even when S is infinite! Consequently, there 
are larger and larger infinite sets whose cardinalities are larger and larger transfinite 
cardinals—and this never ends. He denoted these transfinite cardinals by 81, No,.... 
Thus, there is no largest cardinal. 

Cantor also discovered (using diagonalization, a method he invented; see Sect. 9.1) 
that there are more real numbers than natural ones, i.e., No < c, where c denotes the 
cardinality of R, the set of real numbers. (For the proof see Example 9.1 on p. 207.) 
He also proved that c = 2%0, But where is c relative to No, N1,82,...? Cantor conjec- 
tured that c = Nj, that is, 280 = N,. This would mean that there is no other transfinite 
cardinal between No and c and consequently there is no infinite set larger than N and 
smaller than R. Yet, no one succeeded in proving or disproving this conjecture, un- 
til Gédel and Cohen finally proved that neither can be done (see Box 4.3 on p. 65). 
Cantor’s conjecture is now known as the Continuum Hypothesis. 


Box 2.3 (Proof of Cantor’s Theorem). 
Cantor’s Theorem states: |S| < |2°|, for every set S. 


Proof. (a) First, we prove that |S| < |25|. To do this, we show that S is equinumerous to a subset 
of 2°. Consider the function f : S— 2° defined by f : x++ {x}. This is a bijection from S onto 
{{x}|x € S}, which is a subset of 25. (b) Second, we prove that |S| 4 |25|. To do this, we show 
that there is no bijection from S onto 2°. So let g: S — 2% be an arbitrary function. Then g cannot 
be surjective (and hence, neither is it bijective). To see this, let NV be a subset of S defined by NV = 
{xe S|x¢ g(x)}. Of course, NV €25. But NV is not a g-image of any member of S. Suppose it were. 
Then there would be an m € S such that g(m) =.N. Where would be m relative to N? If me N, 
then m ¢ g(m) (by definition of ’), and hence m ¢ N (as g(m) = .N)! Conversely, if m ¢ N, 
then m¢ g(m) (as g(m) =), and hence m € N (by definition of V’)! This is a contradiction. We 
conclude that g is not a surjection, and therefore neither is it a bijection. Since g was an arbitrary 
function, we conclude that there is no bijection from S onto 2°. 


'3 X is the first symbol of the Hebrew alphabet. 
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Ordinal Numbers 


We have seen that one can introduce order into a set of two elements. This can easily 
be done with other finite and infinite sets, and it can be done in many different ways. 
Of special importance to Cantor was the so-called well-ordering, because this is the 
way natural numbers are ordered with the usual relation <. For example, each of the 
sets {0, 1,2} and N is well-ordered with <, that is,0<1<2and0<1<2<3<::-. 
(Here < is the strict order corresponding to <.) But well-ordering can also be found 
in other sets and for relations other than the usual <. When two well-ordered sets 
differ only in the naming of their elements or relations, we say that they are similar. 

Cantor’s aim was to classify all the well-ordered sets according to their similarity. 
In doing so he first noticed that the usual well-ordering of the set {0,1,2,...,n}, 
n €N, can be represented by a single natural number n + 1. (We can see this if we 
construct n+ 1 from 0, as von Neumann did.) For example, the number 3 represents 
the ordering 0 < 1 < 2 of the set {0,1,2}. But the usual well-ordering of the set N 
cannot be described by a natural number, as any such number is too small. Once 
again a new kind of a “number” was required and a new symbol for it was needed. 
Cantor denoted this number by @ and called it the ordinal number. 

Well-ordering of a set can thus be described by the ordinal number, or ordinal for 
short. An ordinal is either finite (in which case it is natural) or transfinite, depending 
on whether it represents the well-ordering of a finite or infinite set. For example, 
@ is the transfinite ordinal that describes the usual well-ordering in N. Of course, 
in order to use ordinals in classifying well-ordered sets, Cantor required that two 
well-ordered sets have the same ordinal iff they are similar. (See details in Box 2.4.) 
Then, once again, he proved that there are larger and larger transfinite ordinals de- 
scribing larger and larger well-ordered infinite sets—and this never ends. There is 
no largest ordinal. 


NB With his set theory, Cantor boldly entered a curious and wild world of infinities. 


2.1.3 Logical Paradoxes 


Unfortunately, the great leaps forward made by Cantor’s set theory called for a re- 
sponse. This came around 1900 when logical paradoxes were suddenly discovered 
in this theory. A paradox (or contradiction) is an unacceptable conclusion derived by 
apparently acceptable reasoning from apparently acceptable premises (see Fig. 2.3). 


Burali-Forti’s Paradox. The first logical paradox was discovered in 1897 by 
Burali-Forti.'* He showed that in Cantor’s set theory there exists a well-ordered 
set Q whose ordinal number is larger than itself. But this is a contradiction. (See 
details in Box 2.4.) 


14 Cesare Burali-Forti, 1861-1931, Italian mathematician. 
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Cantor’s Paradox. A similar paradox was discovered by Cantor himself in 1899. 
Although he proved that, for any set S, the cardinality of the power set 2° is strictly 
larger than the cardinality of S, he was forced to admit that this cannot be true of 
the set U of all sets. Namely, the existence of / was guaranteed by the Axiom of 
Abstraction, just by defining U = {x|x = x}. But if the cardinality of U/ is less than 
the cardinality of 2“, which also exists, then U/ is not the largest set (which U/ is 
supposed to be since it is the set of all sets). This is a contradiction. 


Russell’s Paradox. The third paradox was found in 1901 by Russell.!° He found 
that in Cantor’s set theory there exists a set that both is and is not a member of 
itself. How? Firstly, the set R defined by 


R={S | Sisaset and S does not contain itself as a member} 


must exist because of the Axiom of Abstraction. Secondly, the Law of Excluded 
Middle guarantees that 7? either contains itself as a member (i.e., R € R), or does 
not contain itself as a member (i.e., R ¢ 7). But then, using the definition of R, each 
of the two alternatives implies the other, that is, RER <> RER. Hence, each of 
the two is both a true and a false statement in Cantor’s set theory. 
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Fig. 2.3 A paradox is an unacceptable statement or situation because it defies reason; for example, 
because it is (or at least seems to be) both true and false 


Why Do We Fear Paradoxes? 


Suppose that a theory contains a logical statement such that both the statement and 
its negation can be deduced. Then it can be shown (see Sect. 4.1.1) that any other 
statement of the theory can be deduced as well. So in this theory everything is de- 
ducible! This, however, is not as good as it may seem at first glance. Since deduction 
is ameans of discovering truth (i.e., what is deduced is accepted as true) we see that 
in such a theory every statement is true. But a theory in which everything is true has 
no cognitive value and is of no use. Such a theory must be cast off. 


'S Bertrand Russell, 1872-1970, British mathematician, logician, and philosopher. 
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Box 2.4 (Burali-Forti’s Paradox). 


A set S is well-ordered by a relation ~ if the following hold: 1) a A a; 2) a#b > a~<bv b~a; 
and 3) every nonempty VY C S has m € & such that m ~ x for every other x € 4. For example, N 
is well-ordered with the usual relation < on natural numbers. Well-ordering is a special case of the 
so-called linear ordering, i.e., a well-ordered set is also linearly ordered. For example, Z, the set 
of integers, is linearly ordered by the usual relation <. 

Suppose we do not want to distinguish between two linearly ordered sets that differ only in the 
naming of their elements or relations. We want to consider such sets as being similar, because they 
obviously share the same “type of order”. 

Let us define the notion “type of order’ precisely. Let two sets A and B be linearly ordered 
with relations <4 and <,, respectively. We say that A and B are similar if there is a bijection 
f. : A > Bsuch that a <4 b <=> f(a) <p f(b). The function f renames the elements of A to the 
elements of 6 while respecting both relations. We can easily prove that similarity is an equivalence 
relation between linearly ordered sets. So we can define the order type to be an equivalence class of 
similar, linearly ordered sets. Informally, an order type is the feature shared by all linearly ordered 
sets that differ only in the naming of their elements and relations. 

Having defined the order types we might want to compare them. Unfortunately, they may not be 
comparable. It can be shown, however, that order types of well-ordered sets are themselves linearly 
ordered by some relation <,. (Actually, <, is the usual set-membership relation €.) Because such 
order types are ordered in a similar way to integers, we call them ordinal numbers (or ordinals for 
short). Hence, the definition: An ordinal is an equivalence class of similar well-ordered sets. For 
example, sets similar to {0,1,...,2} have the same ordinal; we denote it by the natural number 
n+ 1. This cannot be done with sets similar to N, so we use @ to denote their ordinal. 

For each ordinal @ there is exactly one ordinal a’ Sau {a} that is the <,-successor of a. (We 
also denote a’ by @ + 1.) It follows that there is no <,-largest ordinal. 

This is where Burali-Forti entered. He proved that Cantor’s set theory allows the construction 
of a set Q of all the ordinals. He also showed that such an Q leads to a paradox. Namely, Q would 
not only be linearly ordered by <,, but also well-ordered by <,. As such, 2 would be associated 
with the corresponding ordinal, say @g. But where would Gq be relative to Q? Since Q is the set 
of all the ordinals, it must be that @g € Q. On the other hand, @g must be <,-larger than any 
member of ©, and therefore larger than itself. 


2.2 Schools of Recovery 


In this section we will describe the three main schools of mathematical thought that 
significantly contributed to the struggle against paradoxes in mathematics. These are 
intuitionism, logicism, and formalism. We will show how their discoveries were syn- 
thesized in the concept of a formal axiomatic system and then in a clear awareness 
that a higher, metamathematical language is needed to investigate such systems. 
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2.2.1 Slowdown and Revision 


The discovery of the paradoxical sets Q, U/, and R was shocking, because Can- 
tor’s set theory was supposed to become a firm foundation for all other fields of 
mathematics and should, therefore, have been free of paradoxes. But the simplic- 
ity of Russell’s Paradox, which used only two basic notions of set and membership 
relation, revealed that paradoxes originated deep in Cantor’s theory, in the very def- 
inition of the concept of a set. It was this definition of a set and the unrestricted use 
of the Axiom of Abstraction that allowed the existence of the sets Q, U/, and FR that, 
in the end, caused and revealed paradoxical situations. So it was clear that objects 
like Q, U, and R should not be recognized as existing sets. 

Therefore, Cantor’s naive definition of the concept of a set (Sect. 2.1.2) should be 
restricted somehow. But this was easier said than done. Namely, Cantor’s definition 
of a set was so natural and of such common sense that it was far from clear how to 
restrict it and, at the same time, retain all the sound parts of the theory. If a set is not 
what Cantor thought about, then what was it? And what was it not? 

This once again triggered a critical reflection about the basic concepts, notions, 
principles, methods, and tools of set theory and logic, which might be sources of 
paradoxes. The aim was to make the necessary corrections to them, so that they, as 
a whole, would again act as a foundation for the development of mathematics and 
other axiomatic areas of science, but this time safe from all paradoxes. It turned 
out that no universally accepted resolutions could be expected. The critiques and 
proposals went in several directions, of which the three mainstream directions were 
called intuitionism, logicism, and formalism. Because they all contributed to future 
events, we briefly review them. 


2.2.2 Intuitionism 


Intuitionism argued for greater mathematical rigor in the process of proving and it 
advocated a non-Platonic view that the existence of a mathematical object is closely 
connected to the existence of its mental construction. 


Fig. 2.4 Jan Brouwer 
(Courtesy: See Preface) 
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The school was initiated by Brouwer!® and then further developed by his student 
Heyting.!’ Brouwer was critical of the way in which Cantor’s mathematics viewed 
the existence of infinite sets, and of the way in which mathematics was using the 
Law of Excluded Middle. He proposed a thorough change of this view as well as 
severe restrictions on the use of the law. Specifically, unlike Cantor, who considered 
infinite sets as actualities, i.e., accomplished objects, intuitionism advocated the 
classical point of view that infinite sets are no more than potentialities, i.e., objects 
that are always under construction, making it possible to construct as many members 
as needed, but never all. This view called for a change in the way that the existence 
of objects in infinite sets should be proven: an object is recognized as a member of 
an infinite set if and only if the object has been constructed or the existence of such 
a construction is beyond doubt. We give the details in Box 2.5. 

Using these principles, intuitionism reconstructed several parts of classical math- 
ematics and showed that such intuitionistic mathematics is free of all known para- 
doxes. Unfortunately, the price for this was rather high: large parts of mathematics 
had to be cast off, because it seemed impossible to reconstruct them according to 
intuitionistic principles. In addition, in the reconstructed mathematics, surprising 
changes occurred; for example, every (constructed) function is continuous. 

Not surprisingly, it turned out that only a few researchers were willing to make 
such radical sacrifices. 


NB Nevertheless, the intuitionistic demand for mathematical rigor survived and 
was partially taken into account in the events to follow. 


Box 2.5 (Intuitionism). 
This school argued for greater mathematical rigor in several ways. 


View of Infinity. Since Aristotle, mathematics understood infinity only as the potentiality (i.e., pos- 
sibility), never as the actuality (i.e., accomplishment). For instance, it is true that natural numbers 
0,1,2,... continue endlessly, yet up to any natural number there are only finitely many of them, 
and when we say that what remains is infinite we only mean that the rest, although growing ever 
larger, remains never accomplished. So, in the classical view infinity is by nature never accom- 
plished, never actual. In contrast, Cantor’s view of infinity was different, indeed radically Platonic: 
“Any set, regardless of its size, is as much real as its members are real,” he boldly advocated. To 
Cantor the set {0,1,2,...} was an actual, accomplished mathematical object. 

Intuitionism returned to the classical view of infinity as potentiality. According to this view, 
using an appropriate procedure, we can find in an infinite set as many members as we wish, but 
never all of them. To treat infinite sets as actual, accomplished unities, is wrong, said intuitionists, 
and may lead to paradoxes (as shown by Russell and others). 


But there are also differences between classical mathematics and intuitionism. (We use symbols 
J for “exists”; : for “such that”; = for “is”; — for “not”; V for “or”; V for “for all”; see Appendix B.) 


'6 Luitzen Egbertus Jan Brouwer, 1881-1966, Dutch mathematician and philosopher. 
'7 Arend Heyting, 1898-1980, Dutch mathematician and logician. 
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Existence of Objects. Intuitionism treats the existence of mathematical objects differently from 
classical mathematics. In classical mathematics, mathematical objects exist per se, as Platonic 
ideas (see p. 10). Consequently, statements about mathematical objects are either true or false. 
Intuitionism does not accept this view. Instead, it advocates that the only things that exist per se are 
mental, mathematical constructions, while the existence of an object that has not been constructed 
remains dubious. For intuitionism, to exist is the same as to be constructed. 

For instance, in classical mathematics, given a set S and a property P sensible for the members 
of S, we are always allowed to indirectly prove that there exists a member of S with the property 
P, i.e., that the statement 4x € S : P(x) is true. To do this, we first make the hypothesis H = Ax € 
S : P(x), stating that such a member does not exist. Then we try to deduce from H a contradiction. 
If we succeed in this, we conclude that H is false. Now comes the critical step: since classical 
mathematics fully accepts the Law of Excluded Middle, there can be no other alternative but to 
conclude that ~H = dx € S : P(x) is true, i.e., that such a member of S exists. But note that, 
generally, we have no idea about this member, or how to find it. 

Intuitionism does not accept such an indirect proof of existence when the set S is infinite. In- 
deed, it rejects any proof of existence that neither constructs the alleged object, nor describes how 
to construct it at least in principle. 


Use of Logic. The intuitionistic point of view was also reflected in the use of logic. For example, 
because of the Law of Excluded Middle, classical mathematics takes for granted that, for any state- 
ment F, either F or —F is true. Hence, the statement F V -F is a priori true, even though we may 
never determine the truth-values of F and —F. Intuitionism, in contrast, treats the truth-values of 
F and —F as dubious, until they are actually determined in some indisputable way. 

To explain the reasons for such caution, let S be a set, P a property sensible for the members of 
S, and F the statement Vx € S : P(x). So F conjectures that every member of S has the property 
P. How can we indisputably determine whether or not F is true? Can we always do this? 

First, we can try to prove in one sweep that all the members of S have the property P. (We 
can use various techniques, such as mathematical induction.) If it turns out that we are unable to 
construct a proof that works for every member of S, it might be that F is false. However, it might 
also be that F is true, where P(x) holds for every x € S, but for a different reason in each case. That 
is, our inability to construct a one-sweep proof might be due to the lack of a recognizable pattern, 
i.e., acommon reason for which different members of S share the property P. In this case, we can 
neither prove F (because the “proof” would be infinitely long) nor refute it (because F is true). 

We must therefore resort to some other method to settle the conjecture F’. If S is finite, we can 
in principle check, for each x € S individually, whether or not P(x) holds. When the checking is 
finished, we know either that F is true, or that it is false. But what if S is infinite? We can still do 
the checking, but we must be aware of the following. We may check as many members of S as 
we like, say 10!8, and find that each of them has the property P—but, generally, there is no way 
of knowing whether, for a member yet to be checked, P holds or not. So we keep checking in the 
hope that such a member will be reached soon. But, if in truth F is true, the checking will continue 
indefinitely, and we will never find out whether F is true or false. (By the way, this is the present 
situation with Goldbach’s Conjecture; see Box 5.4 on p. 100.) 

Finally, we may try to prove F = Vx € S : P(x) by contradiction. As usually, we assume the 
converse, i.e., that aVx € S : P(x) is true. In classical mathematics, where the equivalence “Vx € 
S : P(x) = > dx € S: P(x) holds for arbitrary S, we would try to deduce a contradiction from 
the more promising right-hand side of the equivalence. In intuitionism, however, the equivalence 
does not a priori hold; namely, if S is infinite, the statement 4x € S : =P(x) is dubious until we 
have constructed an x € S for which —P(x) holds. (As we have seen above, this may not be easy.) 
As long as 4x € S : +P(x) is dubious, it cannot be used to deduce a contradiction, and our proving 
by contradiction is stalled. 

So in some situations the truth-value of a statement F cannot be indisputably determined. 


2.2 Schools of Recovery 23 


2.2.3 Logicism 


Logicism aimed to found mathematics on pure logic. As a side-effect it developed 
the notation by which mathematics was at last given concise and precise expression. 
The main contributions to this school were made by Boole, Frege, Peano, Russell, 
and Whitehead. 


Fig. 2.5 George Boole Fig. 2.6 Gottlob Frege Fig. 2.7 Giuseppe Peano 
(Courtesy: See Preface) (Courtesy: See Preface) (Courtesy: See Preface) 
Boole 


In the middle of the nineteenth century scientists noticed that, from Aristotle on- 
ward, logical deduction had been using various self-evident rules of inference that, 
surprisingly, had never been rigorously analyzed and written down. 

Boole!® was among the first to become aware of the pitfalls of this. He embarked 
on the question of how to express logical statements by means of algebraic ex- 
pressions containing the operations “and”, “or”, and “not”, and then algebraically 
manipulate these expressions to pursue logical deduction. He described his discov- 
eries in the book The Laws of Thought (1854) and thus founded algebraic logic. His 
logic was further developed by Peirce!® and others in the early twentieth century to 
become what is now known as Propositional Calculus P (see Appendix A). Since 
then a more precise and clear expression of logical statements has been possible. 


Frege and Peano 


Frege”? was aiming even higher. His goal was to show that arithmetic can be de- 
duced from pure logic. In particular, he planned to define number-theoretic notions 
(i.e., numbers, relations, and operations on numbers) by pure logical notions, and to 
deduce arithmetical axioms from logical axioms. 


'8 George Boole, 1815-1864, English mathematician and philosopher. 
'° Charles Sanders Peirce, 1839-1914, American philosopher, logician, mathematician, and scientist. 
20 Friedrich Ludwig Gottlob Frege, 1848-1925, German mathematician, logician, and philosopher. 
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Like Boole, Frege was well aware that a natural language, such as German, 
has structural, rhetorical, psychological, and other characteristics that often blur the 
meaning of its own statements and, consequently, the argumentation of the deduc- 
tions. This required the introduction of a new, formal notation by which mathematics 
and logic could be given concise and precise expression. In particular, such a nota- 
tion should be able to isolate all the important logical principles of inference while 
throwing off the lumber of natural language. In other words, the notation should be 
able to support purely logical deduction. So, in 1879, Frege proposed his Begriffs- 
schrift, a “conceptual notation”, capable of giving mathematics and logic better ex- 
pression. Begriffsschrift was based on an alphabet of symbols, from which math- 
ematical and logical expressions were constructed using rules of construction. An 
important innovation of Frege was that these rules directed the purely mechanical 
manipulation of symbols, without appealing to intuition or to the (possible) mean- 
ing of symbols. In addition, Frege introduced quantified variables and thus laid the 
foundations of the First-Order Logic (which we will describe later). The inferences 
were described diagrammatically, so they were in this respect somewhat unusual. 
Nevertheless, Begriffsschrift was capable of precisely and concisely representing 
the inferences that involved arbitrary mathematical statements. 

At the same time, Peano! developed another symbolic language for expressing 
mathematical statements. He used innovative logical symbols (e.g., €, =>) in order 
to distinguish between logical and other operations. In 1895, he published a book 
Fomulario Mathematico where he expressed fundamental theorems of mathematics 
in his symbolic language. Peano’s notation proved to be more practical than Frege’s 
notation and after having gone through further development is in common use today. 

In short, among Frege’s and Peano’s contributions to logic were the analysis 
of logical concepts, the foundation of the First-Order Logic L (see Appendix A, 
p. 364), and the introduction of a standard formal notation. 


Russell and Whitehead 


Russell’s goal was even more ambitious than Frege’s. He wanted to deduce all math- 
ematics from logic. Namely, at the end of the nineteenth century it had already 
been shown that many concepts of algebra and analysis can be defined by means of 
number-theoretic notions, which, in turn, can be defined with purely logical notions. 
To avoid his own paradox, Russell invented the Theory of Types. There are three 
requirements in this theory: 1) A hierarchy of types must be established. A type can 
be a member of any well-ordered set, e.g., a natural number. 2) Each mathematical 
object must be assigned to a type. 3) Each mathematical object must be constructed 
exclusively from objects of lower types in the hierarchy. As a result, the set U/ of all 
sets cannot exist in this theory (because U/ € U/), and neither does Russell’s Paradox 
(for if R existed, we would have R ¢ R because of its type, and consequently R € R 
due to its definition: a contradiction). Similarly, Q would not exist (as Q € Q). 


21 Giuseppe Peano, 1858-1932, Italian mathematician. 
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These ideas were described in the 1910-13 book Principia Mathematica (PM) 
by Whitehead?” and Russell. Using symbolic notation based on Peano’s work, they 
developed from logical and three additional axioms the theory of sets and cardinal, 
ordinal, and real numbers, while avoiding all known paradoxes. The deductions were 
long, even cumbersome, yet many shared the opinion that the remaining fields of 
mathematics could also be deduced (at least in principle). 


Fig. 2.8 Alfred Whitehead Fig. 2.9 Bertrand Russell 
(Courtesy: See Preface) (Courtesy: See Preface) 


Did Principia Mathematica put an end to the crisis in mathematics? Not really. 
There were imperfections in PM. First of all, there was a kind of aesthetic flaw in 
the set of PM’s axioms, because in addition to logical axioms there were three ax- 
ioms not recognized as purely logical. One of these was Cantor’s Axiom of Choice. 
More importantly, it remained unclear as to whether PM is consistent, i.e., it avoids, 
besides all known paradoxes, also all the other paradoxes that may still be hidden in 
various fields of mathematics, patiently awaiting their discovery. This question be- 
came known as the Consistency Problem of PM. In addition, it was not clear whether 
PM was complete, i.e., whether exactly all true statements are provable within PM. 
This was the Completeness Problem of PM. Consequently, PM was not widely ac- 
cepted.?3 


NB Nevertheless, PM was all-important for future events, because it finally devel- 
oped 1) a symbolic language for the concise and precise expression of mathematical 
statements from an arbitrary field of mathematics; and 2) a concise formulation of 
all the rules of inference used in the deduction of mathematical theorems. In addi- 
tion, PM led to a clear formulation of the problems of consistency and completeness 
of a particular axiom system, the PM.7+ 


22 Alfred North Whitehead, 1861-1947, British mathematician and philosopher. 


23 Tn addition, it would soon turn out that Russell’s Paradox, as well as other paradoxes stemming 
from Cantor’s liberal Axiom of Abstraction, can be eliminated just by a two-level hierarchy of sets 
and classes, instead of the complicated infinite hierarchy of types. (See Box 3.7 on p. 50.) 


24 As we will see in Chap. 4, the two problems were later solved in general by Godel. 
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The concepts and tools developed by intuitionism and logicism were used by for- 
malism, the third of the schools that attempted to resolve the crisis in mathematics. 


2.2.4 Formalism 


Formalism could not accept the radical measures suggested by intuitionism. It 
wished to keep all classical mathematics. After all, classical mathematics had been 
proving its immense usefulness from the very beginning. 

To achieve this, formalism focused on a radically different aspect of human math- 
ematical activity. Instead of being the meaning (i.e., semantics, contents) of mathe- 
matical expressions and inferences, the subject of the formalists’ research was their 
structure (i.e., syntax, form). Formalism focused on the formal-language formula- 
tion of human mathematical activities and their results, as well as on the relations 
between these formulations. 

This school was initiated by Hilbert”> and then developed in close collaboration 
with Ackermann,2° Bernays,”’ and others. 


Fig. 2.10 David Hilbert 
(Courtesy: See Preface) 


Syntax vs. Semantics 


Hilbert became fully aware that it is sensible to draw a distinction between syntactic 
notions (i.e., notions referring to the structure of mathematical expressions) and se- 
mantic notions (1.e., notions referring to the meaning of mathematical expressions). 
For instance, the interpretation of a theory is a semantic notion. Recall that inter- 
pretation gives a meaning to a theory developed in a hypothetical axiomatic system 
(see p. 12). To describe the interpretation one needs to describe its domain, which is, 
mathematically, a set. But the concept of a set was not clear at that time. So Hilbert 
advocated a focus on syntactic notions, as the research of these seemed to require 


25 David Hilbert, 1862-1943, German mathematician. 
26 Wilhelm Friedrich Ackermann, 1896-1962, German mathematician. 
27 Paul Isaac Bernays, 1888-1977, Swiss mathematician. 
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only the non-problematic parts of mathematics, that is, basic logic and some basic 
arithmetic. 

Let us describe these ideas in greater detail. Because it had been seen that mathe- 
matical concepts, such as that of the set, may be vague, also inference incorporating 
such concepts may be false, and may, eventually, lead to paradoxes. On the other 
hand, mathematical notions are always expressed in the words of some language, 
either natural, such as English, or symbolic, designed just for this purpose. A word 
is a finite sequence of symbols from some finite alphabet. Formalism noticed that ev- 
ery symbol is perfectly clear per se, that is, a symbol is comprehended as soon as it 
is recognized as a discrete part of the reality, without any further intuitive or logical 
analysis. This comprehension of symbols is independent of their intended meaning, 
which might previously be associated with them (such as the operation of addition 
with the symbol “+’?). So why not comprehend words in that manner as well? One 
should only ignore the intended meaning of the word at hand and comprehend and 
treat it simply as a finite sequence of symbols. Expressions, i.e., sequences of words, 
could also be treated in the same fashion and, finally, sequences of expressions too. 

After the banishment of meaning from language constructs, one would be free 
to focus on their structure (syntax). But why do that? The reason is that one could 
found mathematical inference on a clear and precise structure (syntax) of language 
constructs, instead of on their (sometimes) unclear meaning (semantics). The syntax 
is always clear, provided it is rigorously and precisely defined (as was the case 
with logicism). As a result, a proof (deduction) would simply be a finite sequence 
of language constructs (expressions), built according to a finite number of rules. 
The gain would be improved control over the process of deduction and, finally, the 
elimination of paradoxes. 


Formal Axiomatic Systems 


In order to implement these ideas, formalism invented formal axiomatic systems. 
Each such system offers 1) a rigorously defined symbolic language; 2) a set of rules 
of construction, 1.e., syntactic rules that are used to build well-formed expressions, 
called formulas, of the language; and 3) a set of rules of inference that are used 
to build well-formed sequences of formulas, called derivations or formal proofs. 
Each formula or derivation is viewed and treated exclusively as a finite sequence of 
symbols of the language. Hence, though each formula has a definite structure, no 
meaning is to be seen or searched for in it. Some of the formulas are distinguished 
as axioms. Given a finite set of formulas, one may infer a new formula by applying 
a rule of inference. Formulas that can be derived by a finite sequence of inferences 
from axioms only are called theorems. Axioms, theorems, and other formulas make 
up the theory belonging to the formal axiomatic system at hand. A detailed discus- 
sion of formal axiomatic systems and their theories will appear in the next chapter. 
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Interpretation 


Let us stress that formalists were aware of the fact that there was a limit to neglect- 
ing the semantics. After all, their ultimate goal was to establish conditions for the 
development of sound, safely applicable theories. They were aware that each theory, 
developed in a formal axiomatic system, should eventually be given some meaning; 
otherwise it would be of no use. In other words, the theory should be interpreted. 
Informally, an interpretation of a theory in a field of interest maps formulas of the 
theory into statements about (some) objects of the field. We will discuss interpreta- 
tion again shortly. 


NB Formalism cast out the issues of meaning from the development of a theory, 
and shifted them to a later interpretation. What were the expected benefits of this? 
Such a theory could clearly show the syntactic properties of its expressions and 
expose various relations between these properties. Laid bare, the whole theory could 
be examined by metamathematics and subjected to its judgment. 


Metamathematics 


When a theory is developed in a formal axiomatic system, the only things that can 
be examined within or about it are its expressions, the syntactic properties of expres- 
sions, and the relations between them. All these are unambiguously determined by 
the formal system (i.e., its language and rules of construction and inference). Thus, 
syntactic aspects of the theory can be systematically analyzed without the interfer- 
ence of semantic issues. Only now can one raise well-defined questions about the 
theory and propose answers to such questions. 


This theory is inconsistent, 
__— 


Fig. 2.11 A statement about the theory belongs to its metatheory 


But questions and statements about the theory are no longer part of the the- 
ory. Instead, they belong to the higher “theory about the theory,’ which is called 
a metatheory, or, more generally, metamathematics.® Thus, the subject matter of a 
metatheory is some other theory. 


28 meta- (Greek peta) = after, beyond, about 
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Metamathematical statements are formulated in a metalanguage. This is a natural 
language or a fragment of it that is appropriately augmented by special symbols and 
other objects. We will say more about metalanguage and two such symbols, + and 
E=, shortly. 

The proving of metamathematical statements is still necessary; however, it is 
not formal, in contrast to proving within the formal system. Instead, the usual (i.e., 
semantic, informal) proving is used, where each inference in a metamathemati- 
cal proof must be grounded in the meaning of its premises. Of course, premises 
are metamathematical statements, so they can refer only to syntactic aspects of 
the theory. 

In addition, to avoid any doubts that might arise because of the use of infinity, 
only finite objects and techniques are allowed in metamathematical proofs. Such a 
cautious and indisputable way of reasoning is called finitism. 


Goals of Formalism 


Formalism harbored hopes that the analysis of formal systems would provide an- 
swers to many important metamathematical questions about the theories of interest. 
Specifically, these were the two well-known questions concerning mathematics de- 
veloped in Principia Mathematica (see p. 25): 


e The Consistency Problem of PM = “Ts the math developed in PM consistent?” 
e The Completeness Problem of PM = “Is the math developed in PM complete?” 


But the ultimate goals of formalism were even more ambitious. Specifically, for- 
malists intended to: 


1. develop all mathematics in one formal axiomatic system; 
2. prove that such mathematics is free of all known and unknown paradoxes. 


2.3 Chapter Summary 


The axiomatic method was used to develop mathematics since its beginnings. The 
evident axiomatic system required that basic notions and axioms be clearly con- 
firmed by reality. Since it was found that human experience and intuition may be 
misleading, the hypothetical axiomatic system was introduced. Here, axioms are 
only hypotheses whose fertility is more important than their link to reality. Such 
axiomatic systems offered more freedom in the search for interesting and useful 
theories. This approach was taken by Cantor when he developed his Set Theory. 
Because Cantor treated the existence of infinite sets naively, this resulted in several 
paradoxes in his theory. 

Intuitionism, logicism, and formalism were three schools that reflected critically 
on the mathematical and logical notions and concepts that might be the cause of 
paradoxes. 
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Intuitionism advocated for greater rigor in the process of proving and for the 
non-Platonic view that the existence of mathematical objects is closely connected 
to the existence of their mental constructions. Intuitionism reconstructed several 
parts of classical mathematics that were free of all known paradoxes. But, at the 
same time, large parts of mathematics had to be cast off, as it seemed impossible to 
reconstruct them in the intuitionistic manner. Few researchers were willing to make 
such a sacrifice. 

Logicism, the second school, developed a formal notation by which mathemat- 
ics was given concise and precise expression. It also bore Principia Mathematica, a 
book that finally developed a symbolic language of mathematics and concisely for- 
mulated its rules of inference. In addition, it brought an awareness of the importance 
of the problems of consistency and completeness of axiomatic theories. 

The third school, formalism, built on the ideas and tools developed by intuition- 
ism and logicism, and aspired to retain all mathematics. Formalism acknowledged 
that the syntax and semantics of mathematical expressions should be clearly sep- 
arated and dealt with in succession. It introduced the concept of the formal ax- 
iomatic system, i.e., an environment for the mechanical, syntax-oriented develop- 
ment of a theory. In addition, it introduced a clear distinction between a theory and 
its metatheory. 


@ 
Chapter 3 Ritiem 
Formalism 


The form of something is its shape and structure. Something that 
is done in a formal way has a very ordered, organized method 
and style. Formalism is a style in which great attention is paid to 
the form rather than to the contents of things. 


Abstract The great ideas and tools that intuitionism and logicism discovered in 
solving the crisis in mathematics were gathered by formalism in the concept of the 
formal axiomatic system. Later, formal axiomatic systems led to seminal discoveries 
about axiomatic theories and mathematics in general. Particularly important to us is 
the fact that formal axiomatic systems also gave rise to the need for a deeper under- 
standing of the concepts of algorithm and computation. To appreciate this need, we 
devote this chapter to the understanding of formal axiomatic systems in general and 
describe those particular formal axiomatic systems that played a crucial role in the 
events to follow. 


3.1 Formal Axiomatic Systems and Theories 


In this section we will describe what a formal axiomatic system is and how a theory 
is developed in such a system. We will then show how meaning, and consequently 
a possible application, is given to a formally developed theory. Finally, we will de- 
scribe several formal axiomatic systems and their theories that played important 
roles in the development of the notions of algorithm and computation. 


3.1.1 What Is a Formal Axiomatic System? 


A formal axiomatic system (for short f.a.s.) F is determined by three entities: a sym- 
bolic language, a set of axioms, and a set of rules of inference. 
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Symbolic Language 


The basic building blocks of the symbolic language are symbols. There are a count- 
able (potentially infinite) number of them and they constitute the alphabet of the 
language. In the alphabet there are individual-constant symbols (e.g., a, b, c), 
individual-variable symbols (e.g., x, y, Z), function symbols (e.g., f, g, h), predi- 
cate symbols (e.g., P, Q, R), logical connectives (e.g., V, A, >, =, —), quantification 
symbols (e.g., V, 4), and punctuation marks (e.g., comma “;’; colon “:”; parenthe- 
ses “(’’, “)”). Usually, there is the equality symbol (i.e., =). In some cases, certain 
function symbols will be designated as function-variable symbols and certain predi- 
cate symbols will be designated as predicate-variable symbols. (The reasons for this 
naming of the symbols will become clear when we discuss their interpretation.) 

From symbols one constructs larger building blocks of the language, i.e., symbols 
are combined into arbitrary finite sequences called words. Some of these are called 
terms and are inductively defined by the following rule of construction: A term is 
either an individual-constant symbol or an individual-variable symbol, or it is a word 
f(t1,t2,...,t,), where t; are terms and f is a k-ary function symbol. 

A formula is defined inductively by another syntactical rule of construction: 
A formula is an expression P(t1,t2,...,t,), where t; are terms and P is a k-ary 
predicate symbol; or it is an expression t; = tz, where t1,t2 are terms; or it is one 
of the expressions FV G, FAG, F > G, F 6G, -F, VtF, 4tF, where F and G are 
formulas and 7 is a variable symbol. 

The symbols V and J are called the universal and existential quantification sym- 
bol, respectively, while Vt and 4t, where T is a variable symbol, are called the 
universal and existential quantifier, respectively. If tT immediately following V or J 
can only be an individual-variable symbol, then the symbolic language is said to be 
of the first order. If, however, T can be a function-variable symbol or a predicate- 
variable symbol, then the language is of the second order. 

If VtF and StF are formulas, F is called the scope of Vt and JT, respectively. An 
occurrence of a variable symbol o is bound in a formula G iff either o is the variable 
of V or 4 in G, or it is within the scope of Vo or do in G. Otherwise, the occurrence 
is said to be free in G. We say that the variable symbol o is bound (free) in G iff o 
has a bound (free) occurrence in G. A formula with at least one free variable symbol 
is said to be open. A formula with no free variable symbols is said to be closed; a 
closed formula is also called a sentence. If F is a formula and t is a term, then t 
is said to be free for x in F iff no occurrences of x in F lie within the scope of any 
quantifier Vy, where y is a variable in t. 

Notice that the construction of the building blocks of the language is governed ex- 
clusively by rules of construction that are syntactic by nature. Consequently, neither 
intended nor possible meanings of the building blocks interfere in their construction. 


Example 3.1. (Term, Formula, Sentence) The symbols a and x are terms. If f and g are function 
symbols, then f(x) and g(a,f(x)) are both terms. If P and Q are predicate symbols, then P(a, x) 
and Q(a,x, £(x)) are formulas; so is P(a,x) V Q(a,x, £(x)). The formula VxdyP(x, y) is a sentence 
because its individual-variable symbols x and y are bound by V and J, respectively. The formula 
VxdyR(x,y,z) is open because the individual-variable symbol z is free in R. If h is a function- 
variable symbol, then the formula VhP(a,h(a)) belongs to a second-order language. 
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Axioms 


Axioms are selected formulas. If there is a procedure that can decide whether a for- 
mula is an axiom, then the set of axioms is said to be computable and the theory 
developed in the f.a.s. is said to be computably axiomatizable, or just axiomatizable. 
There are logical and proper (i.e., non-logical) axioms. Logical axioms are present 
in every f.a.s., while proper axioms vary from system to system. As we will see 
shortly, logical axioms are intended to epitomize the principles of pure logical re- 
flection, while proper axioms condense other special basic notions and facts. 


Rules of Inference 


A rule of inference, say 2, specifies the conditions in which, given a set P of formu- 
las, called the premises of &, one is allowed to derive another formula, say F. The 
formula F is called the conclusion drawn from premises P by the rule of inference 
&. We also say that F directly follows from premises P by the rule & and write 


PHF. (Rule of inf. #) 


Two usual rules of inference are Modus Ponens and Generalization. Modus Ponens 
(MP) says: “If G and G= F are premises, then the conclusion F directly follows.” 
That is: “If G is asserted to hold, and G implies F, then also F holds.” In short: 


MP 
G,G>FFF. (Modus Ponens) 


Generalization (Gen) says: “If F(x) is a premise, then the conclusion VxF(x) di- 
rectly follows.” That is: “If F holds for an unspecified x, then F holds for every x.” 


In short: Ge 


F(x) - VxF(x). (Generalization) 


Example 3.2. (Inference) Greg now plays guitar. If Greg plays guitar, Becky sings. So (by MP), 
she now sings. Becky likes ice cream. So (by Gen), she likes vanilla, lemon, ...ice cream. 


Development of the Theory 


When the symbolic language, axioms, and rules of construction are fixed, the devel- 
opment of the theory belonging to the defined formal axiomatic system F can start.! 


' In the Platonic view, as soon as F is defined, also the theory belonging to F is perfectly defined: 
it consists of all those propositions that are provable within F—regardless of whether or not they 
have actually been proved. In this sense, the theory is “static” and defined from its very birth. 
Adopting the Platonic view allows us to identify an f.a.s. and its theory and denote both by F. The 
development of the theory F is by discovering new (existing) theorems, i.e., finding their proofs. 
In contrast, the constructivist view takes the theory to consist of all propositions that have been 
proved within F. At its birth, the theory only contains F’s axioms, and then grows by absorbing new 
propositions as they are proved. At any stage of its development, the theory is denoted by th(F). 
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During the development, the following strict rules must be obeyed. Firstly, each new 
notion must be defined by the basic notions or previously defined notions. Secondly, 
each new proposition must be derived (i.e., formally proved) before it is named a 
theorem. Here, a derivation (i.e., formal proof) of a formula F in the theory F is a 
finite sequence of formulas such that 1) the last formula of the sequence is F, and 
2) each formula of the sequence is either an axiom of F, or it directly follows from 
some of the preceding formulas of the sequence by one of the rules of inference of 
F. Such a formula F is called a theorem of the theory F. That F is a theorem of F we 
denote by 


EF: 
F 


Example 3.3. (Derivation) Let us derive a simple formula t = t in the formal axiomatic system 
A, called the Formal Arithmetic. (We will describe A in detail on p. 46.) In the derivation below, 
the left column contains enumerated formulas that appear in the derivation, and the right column 
explains, for each formula, how the formula was inferred. At this point, the symbols © and 0 should 
not be given any meaning although they resemble the usual symbols for addition and the number 
zero. 


-tpeo=t Axiom 5 of A with x substituted by t. 
.teo0=t>(te0=tsSt=t) Axiom! of A withx,y,z subst. by t © 0,t,t, resp. 
.teo=ts>t=t Modus Ponens of premises 1. and 2. 
-t=t Modus Ponens of premises 1. and 3. 


BwWN EF 


Derivations are finite sequences of formulas, which are in turn finite sequences of symbols. So, 
formally, a derivation is a finite sequence of symbols. For example, the above derivation is the 
sequence 


It seems that the development of a theory F in a formal axiomatic system is 
nothing more than a meaningless manipulation of symbols that runs according to a 
given set of strict rules. It is a kind of a game of symbols regulated by certain rules. 
In other words, the development of the theory F is strictly formal, so F is a rather 
strange-looking theory. Yet the reasons for this are rather meaningful: 


It is in principle much easier to maintain and check the validity of a proof 
that is being constructed in a formal axiomatic system 
than a proof being constructed in a non-formal axiomatic system 
where validity of inferences is decided by creative human thought. 


This is because in a formal axiomatic system deduction is more of a mechanical 
process, so it is less prone to human errors. Such a deduction, now called a deriva- 
tion, can be checked in a purely combinatorial way by checking whether the formal 
rules of construction and inference have been obeyed. In contrast, the situation in a 
non-formal axiomatic system is quite different: There, a proof is a mental process 
that involves the meaning of the constituents of the proof and the relations between 
them. As such the proof is vulnerable and prone to errors because of man’s subjec- 
tive judgment whether an inference is valid. 
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3.1.2 The Notion of Truth 


In the previous chapter, we stated on several occasions that something was true 
or false. In phrasing our statement, we used the words “true” and “false” in the 
way customary in natural language such as the whole of English. Because of this, 
we assumed without hesitation that the reader would intuitively comprehend the 
statement and thus need no further explanation. Nevertheless, we did touch on an 
important notion, the notion of truth. Since truth will explicitly or implicitly play a 
significant role in the rest of the book, we devote this section to shedding some light 
on it. In this we will follow Tarski (see Bibliographic Notes to Chapter 4). 


Elusiveness of the Notion of Truth 


The prevailing usage of the words “true” and “truth” originates in the classical con- 
ception of truth. This conception is epitomized by the following saying from Aris- 
totle’s Metaphysics: 


To say of what is that it is not, or of what is not that it is, is false, 
while to say of what is that it is, and of what is not that it is not, is true. 


Based on this, most early modern philosophers of the 1800s professed to accept as 
a definition of the notion of truth something like the following: 


Truth is agreement of thought with its object. (1a) 
Truth is correspondence to reality. (1b) 


However, no final definition was accepted because philosophers’ opinions dif- 
fered about the /ocation of the objects and, hence, about the nature of agreement. 
Specifically, should our thought agree with (i) objects located in an external world; 
or (ii) objects located in our mind along with our thought; or (iii) the interac- 
tion between our mind and the external world? A similar situation emerged in 
analytic philosophy, the mainstream Anglophone philosophy since the beginning 
of the 1900s, which incorporated mathematical precision and argumentative clar- 
ity. Analytic philosophers differed in their views of what “correspondence” and 
“reality” mean. 


NB All in all, the definition of the notion of truth proved to be elusive. 


It is therefore not surprising that this elusiveness—added to the well-known 
perplexities of infinity, deceptiveness of intuition, and the recently discovered 
paradoxes—made mathematicians of the 1900s suspicious of the notion of truth, 
thus avoiding its use in their endeavor. For example, formalism avoided dealing 
with truth by focusing on mechanical symbol manipulation (see previous page and 
Sect. 2.2.4) and postponing the matters of truth to later stages (see Sect. 3.1.3). 
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Tarski on the Notion of Truth 


In contrast, Tarski? foresaw important applications of the notion of truth in mathe- 
matics. He embarked on a project to rehabilitate it. 


Fig. 3.1 Alfred Tarski 
(Courtesy: See Preface) 


Tarski insisted that the answer to the question “What is truth?” should fulfill the 
following two natural requirements: it should give (i) a definition of the notion of 
truth; and (ii) a description of usage of the notion that would agree with its prevailing 
usage in natural language. Accordingly, Tarski first focused on natural languages. 
See Box 3.1 for some of his discoveries. 


Box 3.1 (Definition of Truth for Natural Languages). 


Tarski started with the following question: “What things bear truth or falsity?” The obvious answer 
was that it is sentences that are the bearers of truth because it is a sentence that must be written or 
spoken if we want to express something true or false—assuming that the sentence is meaningful.? 


Material Adequacy of the Definition of Truth 


Tarski then founded his research into the notion of truth on the following principle: 
Saying something is equivalent to saying it is true. (2) 
So, saying that snow is white is equivalent to saying that the sentence “Snow is white.” is true: 


“Snow is white.” is true iff snow is white. (2a) 


Note that the equivalence (2a) defines the meaning of the word “true” only when “true” refers to 
the particular sentence “Snow is white.” Thus, (2a) is just a partial definition of the word “true”. 

But we can construct more partial definitions of the word “true” by taking other sentences, e.g., 
“Blood is red.”, “‘ Coal is black.”, “Grass is green.”, “One and one is two.”, and so on. The obtained 
partial definitions share the same form 


. ” is true iff j (3) 


? Alfred Tarski (born Teitelbaum), 1901-1983, Polish-American mathematician and logician. 

3 Philosophically, things are more intricate. When a sentence is written or spoken, an orthographic 
or phonological pattern is produced. Now, is it the sentence or its pattern that (1) bears meaning; or 
(ii) may bear more than one meaning; or (ili) whose meaning may depend on features of context? 
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In (3), the same sentence goes into each blank ; thus, “Blood is red.” is true iff blood is red, 
and “Coal is black.” is true iff coal is black. Note the difference between “__” and__in (3): 
while __ on the right-hand side is a sentence that says something that is or is not actually the 
case in the world, the quoted blank “___”’ on the left-hand side is a reference to this sentence, i.e., 
it indicates that the sentence is mentioned (referred to) in the left-hand side. So, to say something 
about a sentence, we quote the sentence. Since ““____—” and __ are of different characters, the 
partial definitions obtained from (3) are not circular. 

Tarski wanted to advance from partial definitions of the word “true” to a general one. Clearly, 
such a definition would conform with the prevailing conception of the notion of truth iffit included 
every partial definition of the word “true” of the form (3). Such a general definition (if there is one) 
of the notion of truth would be called materially adequate. 


Definition of Truth for Natural Languages 


Thus, Tarski was led to the following question: “Can we construct a materially adequate definition 
of the notion of truth?” But he eventually realized that if we want to materially adequately define 
the notion of truth for natural language, several problems arise. Firstly, the number of partial 
definitions of the form (3) is enormous, possibly infinite, so we are forced to admit that any realistic 
definition of the notion of truth composed of partial definitions is itself doomed to be partial. 
Secondly, sentences may contain the very key words “true” (“false”, “truth”, “falsity’’) that we wish 
to define; take, for example, the sentence “Truth exists.” Thirdly, sentences may express assertions 
about themselves; this is because references, used to mention sentences, belong to natural language, 
so a sentence may contain a reference to itself. Such sentences are said to be self-referential. These 
sentences are not a priori controversial: take, for example, the sentence “This sentence contains five 
words.” But if self-reference is combined with other features of the language, they may become 
such: take, for example, the sentence “This sentence is false.’ We leave it to the reader to infer 
the consequences of each of the premises (i) the sentence is true, and (ii) the sentence is false. 
Either premise leads to its negation in spite of using intuitively certain forms of reasoning. In 
other words, if this sentence bears truth or falsity, it bears both of them, contradicting the Law of 
Excluded Middle (see p. 13). Because of this, the sentence is called the Liar Paradox. 

So, natural language contains a sentence which seems to be both true and false. Unless we can 
overcome this paradox, we cannot develop a consistent theory of truth; and without this, we cannot 
satisfactorily understand the relation between our thought and language and the world around us. 


Object Language vs. Metalanguage 


Tarski became aware of the following: (1) The freedom and power of expression of natural lan- 
guages enable the construction of paradoxical sentences; and (ii) The notion of truth for natural 
language should not be discussed in that language only. 

Because of this, he advocated a clear distinction between the language £ for which we want 
to define the notion of truth, and the language Z in which the definition will be formulated and its 
implications discussed. He called £L the object language and L he called the metalanguage of L. 
Tarski showed that paradoxes are inevitable if both £ and Z are the whole of a natural language. 

Consequently, the metalanguage £ must be sufficiently rich to enable the discussion of the ob- 
ject language £. We may expect that £ will contain £ and also a means to refer to the objects of L. 
In addition, there may be some special objects in L. 

Specifically, so far we have been writing as if both £ and Z were the whole of English. If, 
however, we make a distinction between L and CL, then every instance of (3), such as (2a), becomes 
a sentence of £ (because it says something about a sentence of L). Now, since __ on the right- 
hand side of (3) represents a sentence of CL, it follows that £ must contain £. Next, the “___” on 
the left-hand side of (3) indicates that £ must contain a referencing mechanism, such as quotation, 
so that sentences of £ can be mentioned in L. Finally, “is true” in (3) suggests that £ must contain 
a special unary predicate, say [s_a_true_sentence_of_L, which is usually called the truth predicate. 
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Tarski’s Definition of Truth for Formalized Languages 


Difficulties in defining the notion of truth for natural languages £ led Tarski to 
restrict his ambitions and adapt his program as follows: 


e We should define the notion of truth for a fragment F C C of natural language; 
and 


e Although we expect that focusing on ¥ will restrict the applicability of the defini- 
tion of truth, we should try to keep the classical concept of truth essentially intact. 


Which fragment ¥ of natural language should we focus on? Pragmatics suggests 
that it should be a fragment—call it the restricted language—that will serve for the 
purposes of science in general, i.e., the whole realm of intellectual inquiry. 


But can we precisely define the notion of truth for such a restricted language? 
Tarski discovered that the answer is yes, if the restricted language F satisfies the 
following four conditions: 

1) its full vocabulary is available; 

2) its rules of construction are precisely (formally) formulated; 

3) its rules of construction refer exclusively to the form of expressions; and 


4) the meaning (truth or falsity) of an expression depends exclusively on its form. 
Restricted languages ¥ that fulfill these conditions are said to be formalized. 
Clearly, symbolic languages of formal axiomatic systems (see Sect. 3.1.1) are for- 


malized. But also other fragments of natural languages can be formalized, though in 
a less strict and abstract manner. 


In summary, formalized languages are adequate for expression in logical and math- 
ematical theories while admitting the definition of the notion of truth. 

But to fully realize the latter, Tarski already showed (see Box 3.1) that one more 
condition must be fulfilled: 


5) there must be a clear distinction between the formalized (object) language F 
and its metalanguage F. 


Then the notion of truth for ¥ can be defined. 


Box 3.2 shows how truth is defined for F which is the Propositional Calculus P. 
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Box 3.2 (Definition of Truth for Propositional Calculus). 


Let us define the notion of truth for the Propositional Calculus P. We will do this in three stages: 
1. We define the alphabet and expressions over it. The alphabet contains symbols for individual 
constants (a,b, c,...), individual variables (x, y,z,...), and logical connectives (=, V,A,=,<). 
We then define an expression of P to be a finite sequence of these symbols. 


But not every expression will be regarded as meaningful; e.g., Vax/ is such an expression. 


2. We define the syntax of P. Our intention is to syntactically define the meaningful expressions 
as those constructed by certain rules of construction. The definition will also be a (syntactic) 
criterion for deciding whether or not an expression is meaningful. The definition is inductive: 


a. First, we need some a priori meaningful building blocks. The intention is that, in the third 
stage, these will denote true or false elementary mathematical assertions. So we now define: 
An atomic sentence s is the expression consisting of a single constant or variable symbol. 

b. Next, we inductively define (general) sentences as follows: 

A sentence R is either an atomic sentence s or a logical connective =, V,\,=>,< combined 
with one or two sentences: =S, SV P, SA P, S = P, or S <= P, where S and P are sentences. 

c. Finally, we make certain that nothing else is a sentence: 

Sentences of P are exactly the expressions generated by the rules a and b. 


We will need this: For every sentence R there is a unique way to break it up into its components. 
(The proof of this proposition is left to the reader.) 


3. We define the semantics of P, that is, we assign to each sentence R a truth-value (meaning). 
We do this by an inductively defined function v: 


a. First, we define a function vo : s + vo(s) € {true, false}, called the atomic truth assignment, 
that assigns to each atomic sentence s a truth-value. 


b. Next, we extend vo to a function v: R+> v(R) € {true, false}, called the truth assignment, 
that assigns to each general sentence R a truth-value while respecting the rules of construc- 
tion defined in previous stage. To achieve this, the following must hold for any sentences 
R,S,P and any atomic sentence s: 


if R=s then v(R) = vo(s) € {true, false}; 


t if = false; 
if R=-S then v(R)=v(7S) = Deak ve) ee, 

false, otherwise; 

t if =t P) = true; 
Ra sve ten oeynvisvry = {ime ES) =m or (2) = 


false, otherwise; 


t if =t d v(P) = true; 
if R=SAP then v(R)=v(SAP) =3. 0 (8) een 

false, otherwise; 

t if = fal P) = true; 
if R=S=P_ then v(R =80)= {oe if v(S alse or v(P) & 


false, otherwise; 


true, if v(S) = v(P); 


if R=S&P_ then HR) = (807) = {Re otherwise 


Since every sentence R can be uniquely broken up into atomic sentences s, every vo has a unique 
extension to v, and v(R) depends only on the values v(s) = vo(s) for those s that occur in R. 
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3.1.3 Interpretations and Models 


As we have seen in Sect. 3.1.1, all the aspects of meaning have been expelled from 
the development of a theory and postponed to a later stage. We have now come to 
the point where the potential applications of the developed theory can be searched 
for. So, now the question is: “How does a formally developed theory get connected 
to actuality and, at last, gain meaning?” 

On one hand, the first ideas of the intended meaning may be involved in the very 
first stage of establishing a formal axiomatic system, that is, in setting up its sym- 
bolic language and choosing its proper axioms. This usually happens when a formal 
system is established in order to be used in an investigation of a particular, concrete 
field of interest. Some of the concrete fields of interest that we will discuss in the 
next sections are concerned with logical statements, natural numbers, and sets. A 
chosen field of interest is called the field of the intended (or standard) interpretation 
of the formal axiomatic system (and of its theory as well). It is, therefore, reason- 
able to choose an alphabet and rules of construction in such a way that the resulting 
language is capable of a precise and comfortable description of any situation in this 
field. (For example, in Example 3.3 (p. 34) we used symbols @® and = with the obvi- 
ous intention that these symbols will later be interpreted as the addition and equality 
in the set N of natural numbers.) In addition, together with axioms and rules of in- 
ference, the language should facilitate the analysis of the situation. Of course, when 
the development of the theory starts, the role of the intended meaning diminishes 
and syntactic issues come to the fore. 

On the other hand, one may just as well define a formal axiomatic system and 
develop its theory irrespective of any particular field of interest. One just mechani- 
cally develops the theory through the disciplined use of formal rules of construction 
and inference. 

In any case, eventually some meaning must be assigned to the theory if the theory 
is to be applied somewhere. How is this done? 


Interpretation of a Theory 


To assign a particular meaning to a theory F, one has to interpret F in a partic- 
ular mathematical structure+ SY = (Dom,,-). Informally, this means that one has 
to choose a particular class Dom and particular functions and relations defined on 
Dom, and define, for every closed formula of F, how the formula is to be understood 


4 Informally, a mathematical structure is a class Dom endowed with additional mathemati- 
cal objects, such as functions and relations defined on Dom, and certain designated elements. 
For example, groups, rings, vector spaces, and partially ordered sets are structures. Formally, a 


mathematical structure is an ordered set .Y = (Dom,Ro,...,Rk, fo,---;fm:€0,++-;€n), where Dom 
is aclass, Ro,...,R, are relations on Dom, fo,..., fm are functions from Cartesian powers of Dom 
into Dom, and eo,...,é, are designated elements of Dom. For example, (N,=,+,*,0, 1), the ring 


of natural numbers, is a structure. For brevity we used the dot to stand for all the particular func- 
tions, relations, and designated elements to be considered on Dom. 
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as a statement about the members, functions, and relations of Dom. More formally, 
the interpretation of a theory F is an ordered pair (1,.%), where 1 is a mapping 
1: F— .Y that assigns to each symbol, term, and formula of F its “meaning” in 
Dom. (See Fig. 3.2.) The “meaning” can be an element of Dom, or a function or 
relation defined on Dom. The class Dom is called the domain of the interpretation. 
Usually, the mapping 1 is defined inductively in accordance with F’s rules of con- 
struction. Further details about the definition of the interpretation are in Box 3.3. 


Fig. 3.2. When a theory F 

is interpreted in a structure 
S = (Dom,.:), the mapping 1 
assigns to each formula F € F 
a formula 1(F) € -Y 


Box 3.3 (Interpretation). 


How does the interpretation 1 assign meaning to a theory F in a structure Y = (Dom,-)? 
A symbol of F gets its meaning as follows: 


an individual-constant symbol c is mapped to an element of the domain: 1(c) € Dom; 

a k-ary function symbol £ is mapped to a k-ary function defined on Dom: 1(£) : Dom‘ + Dom; 
a k-ary predicate symbol P is mapped to a k-ary relation defined on Dom: 1(P) C Dom; 
logical connectives get their usual meanings (V “or”; A “and”; = “implies”; <= “iff”; — “not”); 
quantification symbols get their usual meanings (V “for all”; 4 “exists”’); 

punctuation marks get their usual meanings (comma, colon, parentheses); 

the equality symbol = always gets its usual meaning (i.e., the equality relation). 


The meaning of a term of F goes with its construction: 


e aterm that is an individual-constant symbol c gets the same meaning as the symbol: 1(c) € Dom; 
e aterm f(t1,t2,...,t,) is mapped to 1(£)(1(t1),1(t2),...,1(t,)); this is an element of Dom. 


A formula of F, too, gets its meaning inductively: 


e aformula P(t1,t2,...,t,) is mapped to 1(P)(t(t1),1U(t2),...,U(t,)); this is a statement that is 
true iff the elements 1(t;) of Dom are related by the relation 1(P); 

e a formula F VG is mapped to the statement 1(F) V 1(G). This statement is true iff at least one of 
the statements 1(F),1(G) is true; 

e the formulas F \G, F > G, F = G, and —F are mapped to statements 1(F) A1(G), .(F) = 1(G), 
i(F) = 1(G), and -1(F), respectively. The statements are true according to the well-known rules 
of the Propositional Calculus P (see Box. 3.2 on p.39 and Appendix A, p. 363). 


Finally, let F(x) be a formula of F with a free individual-variable symbol x. Then: 


e the formula VxF(x) is mapped to the statement 1(VxF(x)), which is true iff the statement 
1(F(x)) is true for every 1(x) € Dom; 

e the formula 3xF(x) is mapped to the statement 1(3xF(x)), which is true iff the statement 
1(F(x)) is true for at least one 1(x) € Dom. 
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Satisfiability and Validity 


Note that free variable symbols have not been assigned exact meanings under the 
interpretation (1,.”). Instead, we only know that a free individual-variable symbol 
represents any element of the domain Dom. Similarly, a free function-variable sym- 
bol and a free predicate-variable symbol, when they exist, represent any function 
and any relation on the domain Dom, respectively. 

Consequently, free variable symbols still await someone to fix their meanings. 
This can usually be done in many ways and, in general, fixing the meanings of free 
variable symbols affects the truth-value of the formula. Let us explain this in detail. 

If a formula F is closed (has no free variable symbols), then its interpretation 1(F) 
is a Statement about the state of affairs in‘Dom. According to the classical conception 
of truth (see Sect. 3.1.2), the statement 1(F) is either true or false, depending on its 
conformity with the existing situation in Dom. Such a formula is P(a,b) in Fig. 3.3. 


(N,=,< +.*, 0,1) 


Fig. 3.3 A theory F is interpreted in the structure (N, =, <,+,*,0, 1). The mapping 1 assigns to the 
predicate symbol P the usual relation < on N, and to individual-constant symbols a and b natural 
numbers 3 and 5, respectively. P(a,b) is true as it conforms with reality. Since the individual- 
variable symbol x is free in P(a,x), the mapping 1 assigns no meaning (i.e., no number) to x. 
Since P(a,x) says nothing about any specific situation in N, it is neither true nor false 


If, however, a formula F is open, then it contains free variable symbols. (For 
example, P(a,x) in Fig. 3.3 is an open formula.) Since the interpretation (1,.%) did 
not assign meanings to these variable symbols, 1(F) says nothing definite about the 
situation in Dom. Thus 1(F) is neither true nor false at this point and, indeed, is not 
yet a statement about the state of affairs in Dom. However, as soon as all the free 
variable symbols are assigned meanings, 1(F) becomes either a true or a false state- 
ment about Dom. (Clearly, free individual-variable symbols are assigned particular 
elements of Dom, while free function-variable symbols and free predicate-variable 
symbols are assigned particular functions and relations on Dom, respectively.) 

Later, we can reassign meanings to one or more free variable symbols of the for- 
mula. The change in the meanings of the free variable symbols generally affects the 
truth-value of the statement 1(F). Regarding this we emphasize two special cases: 


e if (F) is true for at least one assignment of meanings to its free variable symbols, 
then we say that F is satisfiable under the interpretation (1, 7); 

e if1(F) is true for every assignment of meanings to its free variable symbols, then 
we say that F is valid under the interpretation (1,7) and designate this by 


(t,7) 
E F. 
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Logical Validity 


If a formal axiomatic system has been established to investigate a particular field 
of interest, then there is an obvious interpretation of its theory; this is the intended 
interpretation. (If the intended interpretation becomes usual, it is called standard.) 
For example, the formal axiomatic system A (which will be described shortly) was 
defined to formalize arithmetic, so the intended interpretation of A uses the structure 
(N,=,+, *, 0, 1). However, given a formal axiomatic system F, there may be several 
different interpretations (1,.%) of its theory F, each of which differs in 1 or 7. With 
regard to this, of particular importance are those formulas of F that are valid under 
every interpretation of F. Such formulas are said to be logically valid. (See Fig. 3.4.) 


Fig. 3.4 A formula F that is valid under every interpretation (1,.”) is said to be logically valid 


The logical validity of a formula depends only on its structure and the general 
properties of functions, relations, and quantifications; it is independent of any inter- 
pretation. We denote that F is a logically valid formula by 


EF. 


Observe that the logical axioms of F must be logically valid. This should not 
be surprising because logical axioms are meant to epitomize the principles of pure 
logical thought, and such principles should be (and are) independent of the current 
field of man’s interest, i.e., the field of interpretation. Thus they must remain valid, 
irrespective of the interpretation. 


Model of a Theory 


What about the other kind of axiom: proper axioms? These are meant to abstract spe- 
cific basic notions and facts typical of the current field of interest (which is typically 
the intended interpretation of the theory). This leads us to the concept of a model of 
a theory. Given a theory F, it is natural to be interested only in interpretations (1, .7”) 
under which all the proper axioms of F are valid. (Otherwise, F would be of no use 
under the interpretation.) Under such interpretations all the axioms of F are valid (as 
logical axioms are already logically valid). Each such interpretation (1,.%) is called 
a model of the theory F. Intuitively, a model of a theory is any field of interest that 
the theory sensibly formalizes. 
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Example 3.4. (Model of a Theory) The set of natural numbers with usual operations is a model 
of Peano’s Arithmetic (pp. 12 and 46). A sphere is a model of elliptic geometry (Box 2.2, p. 11). 
So is a geoid, the shape of Earth. And our Universe is a model of General Relativity Theory. 


A theory may have several models. When a formula F of a theory F is valid in 
every model of F, the formula is said to be valid in the theory F. This is denoted by 


EF. 
F 


A formula F valid in F represents a certain mathematical Truth expressible in the 
f.a.s. F. We will say that such an F represents a Jruth in F. Each axiom of F repre- 
sents a Truth in F. 

We prefer theories with large power of expression. After we have designed a 
formal axiomatic system F and developed (some of) its theory F, we are interested 
in the existence and kinds of its models, i.e., fields that are sensibly formalized by 
F. We are confronted with questions such as “Does F have models? If so, how many 
are there? What are the differences between them? What is their applicability?” It is 
not very important that a model be a part of the real, actual world; instead, it suffices 
that the model behave as a possible part of the world. When does that happen? It 
turns out that the theory F has to be consistent, i.e., it must not allow a derivation 
of two contradictory theorems (and, hence, has no paradoxes). But the consistency 
of a theory is a semantic notion, so it must be dealt with within the corresponding 
metatheory (metamathematics). We will return to the question of consistency soon. 


3.2 Formalization of Logic, Arithmetic, and Set Theory 


Some of the formally developed theories and their models that played an important 
role in solving the crisis in the foundations of mathematics were concerned with the 
following fields of interest: the structure and use of logical statements, the arithmetic 
of natural numbers, and the construction and use of sets. The corresponding first- 
order formal axiomatic systems and theories are called the First-Order Logic L, the 
Formal Arithmetic A, and the two Axiomatic set theories NBG and ZFC. In this 
section we will get acquainted with each of them. 


Formalization of Logic 


It was clear that in order to develop any theory in a logically unassailable way, the 
corresponding formal axiomatic system must offer all the necessary logical princi- 
ples and tools (i.e., logical symbols, logical axioms, rules of inference). This called 
for serious reflection on all the principles of pure logical reasoning, which should 
result in a formal list of all of them. Fortunately, this was done by Boole, Frege, 
Peano, Whitehead, Russell and other logicists (see Sect. 2.2.3). Formalism was able 
to gather all the undisputed logical principles in a formal axiomatic system called 
First-Order Logic (with equality) and denoted by L. 
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e First-Order Logic® L (with equality). The alphabet of the language of L has a 
potentially infinite number of individual-variable symbols x,y,...; the equality 
symbol =; logical connectives =, =, V; and the usual punctuation marks. Using 
the given logical connectives one may define additional logical connectives, such 
as \, V, <>, 4. The symbolic language is of the first order. The terms and formulas 
are defined in the usual way; e.g., VxVy(x = y > y =x) is a formula. Instead 
of explicit logical axioms, of which there are infinitely many, there are axiom 
schemas that describe all of them (see Box 3.4). The two equality axioms are 
recognized as logical. There are no proper axioms. Because of this, L is said to be 
a pure logic theory. The rules of inference are Modus Ponens and Generalization. 


Box 3.4 (Logical Axioms of L). 


First-Order Logic L (with equality) has axiom schemas.® We use these to build concrete logical 
axioms. Thus, if F,G,H stand for arbitrary formulas, the following are logical axioms: 


I) FS(G=F) 2) (F=>(G=H)) => ((F=>G)=> (F=>H)) 
3) (-G=S-F)=>((-GSF)SG) 4) VxF(x)=> F(t) 
5) Vx(F > G) => (F > VxG) 6) Vx(x =x) 


7) x=y> (F(x,x) > F(x,y)) 


The schemas 1-3 are also axioms of the Propositional Calculus P (see Appendix A, p. 363). 
In 4, t is a term free for x in F(x). In 5, x is not free in F. In 7, F(x, y) arises from F(x, x) by 
replacing some or all of the free occurrences of x by y (these occurrences are free for y). 


First-Order Formal Axiomatic Systems and Theories 


Many other formal axiomatic systems are extensions of the First-Order Logic L 
(with equality). Each of them contains, besides everything that L has, additional 
proper symbols (i.e., constant symbols a,b,c,...; function symbols f,g,h,...; and 
predicate symbols P,Q,R,...) and proper axioms (1.e., axioms that are inspired by the 
intended interpretation). The rules of inference are Modus Ponens and Generaliza- 
tion. Most often these systems use first-order language. In such cases we call them 
first-order formal axiomatic systems, and their theories we call first-order theories. 

Especially important to us will be three first-order theories: Formal Arithmetic 
A, which formalizes the arithmetic of natural numbers, and the two Axiomatic set 
theories ZFC and NBG, which formalize set theory in two different ways. 


Formalization of Arithmetic 


It was clear that in order to formally develop any nontrivial mathematical theory, 
natural numbers had to be taken into account. This is because natural numbers play 


5 Also called First-Order Predicate Calculus. 


© An axiom schema is a formula in the metalanguage of an f.a.s., in which one or more metalin- 
guistic variables, called schematic variables, appear. These variables stand for any formula (which 
may be required to satisfy certain conditions) of the f.a.s. 
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a key role in the construction of other kinds of numbers (e.g., integer, rational, al- 
gebraic, real), and consequently in the development of any nontrivial mathematical 
theory (e.g., algebra, analysis). Fortunately, the grounding for this had already been 
laid; the properties of natural numbers had been described in 1889 by Peano’s nine 
axioms (as mentioned on p. 12). The formal axiomatic system describing arithmetic 
is called the Formal Arithmetic’ and is denoted by A. 


e Formal Arithmetic A. The alphabet of the language of A contains all the sym- 
bols of L and, in addition, the following proper symbols: the individual-constant 
symbol 0, the unary function symbol ’, and the binary function symbols @ 
and ©. Usually, but not necessarily, one can define symbols that are abbre- 
viations for other symbols, e.g., ©, @, @, @. The terms and formulas are con- 
structed as usual. For instance, x’ is a term, x @ 0 = x is an open formula, and 
VxVyVz(x© (y ®z) =(x© y) ® (x ©z)) is aclosed formula (i.e., sentence). The 
symbolic language is of the first order. In addition to logical axioms (actually 
axiom schemas), which were inherited from L, Formal Arithmetic A has nine 
proper axioms (see Box 3.5). The proper axioms summarize the characteristic 
properties of natural numbers as discovered by Peano. There are no additional 
rules of inference besides Modus Ponens and Generalization, inherited from L. 

The standard model of the theory A is (1, (N, =,+,*,0,1)), with the domain N 
being the set of natural numbers and 1, the interpretation that assigns meanings to 
formulas in the usual way. For example, the meaning of the individual-constant 
symbol 0 is the natural number 0; that is, 1(0) = 0. An individual-variable sym- 
bol x means any natural number, that is, 1(x) € N. The meaning of the function 
symbol ’ is the successor function, that is, u(x’) = 1(x) +1. Hence, 0’ means the 
natural number 1, (0’)’ means the number 2, etc. The binary function symbols 
and © are interpreted, as expected, as the addition and multiplication of natural 
numbers. Each closed formula is mapped by 1 to a statement about natural num- 
bers, which is either true or false. The formula x 6 0 = x is open, because x is 
free in it. In the standard model, the formula means: “Adding 0 to a natural num- 
ber gives the same number.” Since this is true for every assignment of a natural 
number to x, the formula is valid under the standard interpretation of A. 


Box 3.5 (Proper Axioms of A). 


1) VxVyV2(x =y => (x=z>y=z2)) 2) VxVy(x=y>x'=y) 
3) Vx(0 4x’) 4) VxVy(x’=y'’>x=y) 
5) Vx(x@0=x) 6) VxVy( i 


7) Vx(xO0=0 8) VxVy(x@y’ = (xOy) @x) 
9) F(0)AVx(F(x) = F(x’)) = VxF(x), for any formula F with free x (but see also Box 3.8) 


Standard interpretation of axioms: 2) Equal natural numbers have equal successors. 3) 0 is not 
a successor of any natural number. 4) If the successors of two natural numbers are equal, then 
the numbers are equal. 5) Adding 0 to a natural number gives the same number. 6) This axiom 


7 Also called Peano Arithmetic and denoted by PA. 
8 The proper axioms listed in Box 3.5 are not the same as those originally proposed by Peano. 
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describes how to add a successor of a natural number. 7) Multiplying a natural number by 0 
gives 0. 8) This axiom describes how to multiply by the successor of a natural number. 9) This 
is the Axiom of Mathematical Induction. It postulates the following principle. Let F(x) be a 
relation on N with a free variable x. (We say that F is a property.) If F (0) is true, and, if for any 
natural n, F(n) implies F(n+ 1), then F(x) is true for all natural numbers x. (See also Box 3.8.) 


Formalization of Set Theory 


Recall that according to Cantor’s naive set theory, the set Sp = {x| P(x)} exists for 
the arbitrary property P. So, if we set P = “‘is a set,” there is a set Sp = U/ of all sets. 
But Russell deduced that then there exists a set which, paradoxically, at the same 
time is and is not a member of itself (see Sect. 2.1.3). Hence, / = Sp, the set of 
all sets, is a paradoxical object as well. The obvious conclusion was that the object 
U = Sp should not exist as a set. As a result, it was necessary to reconsider carefully 
when, for a given property P, the object Sp = {x|x has the property P} has the status 
of a set and when it does not. Which definitions of sets are to be allowed and which 
are not? It was clear that if 2/ had existed, it would have been a huge set. So, was 
it the colossal size of U/ that led to Russell’s Paradox? Should we allow only those 
properties P that define objects Sp of reasonable size? But what is a “reasonable” 
size of a set? In light of the way mathematics should deal with large objects (sets), 
two views arose and led to two axiomatic set theories: 


e Axiomatic Set Theory ZFC. The first view was advocated by Zermelo,? 
Fraenkel,!° and Skolem.!! Their plan was to 


find axioms that will ensure the existence of all sets needed in mathematics, 
and that will, at the same time, prevent the construction of too-large sets. 


Fig. 3.5 Ernst Zermelo Fig. 3.6 Abraham Fraenkel 
(Courtesy: See Preface) (Courtesy: See Preface) 


9 Ernst Friedrich Ferdinand Zermelo, 1871-1953, German mathematician. 
10 Abraham Halevi Fraenkel, 1891-1965, German (later Israeli) mathematician. 
'l Thoralf Albert Skolem, 1887-1963, Norwegian mathematician. 
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Based on this idea they gradually, during 1908-30, defined a formal axiomatic 
system ZF. In this system one can derive all the important theorems of Cantor’s 
naive set theory, while avoiding all the known logical paradoxes. The developed 
theory is called Zermelo-Fraenkel axiomatic set theory. Today, this is a standard 
set theory. When the Axiom of Choice is added to its proper axioms, the theory is 
denoted by ZFC. (For details about the proper axioms of ZFC, see Box 3.6.) 


e Axiomatic Set Theory NBG. The second view was less conservative. It was 
advocated by von Neumann,!* Bernays, and Gédel.!° Their belief was that 


paradoxes do not follow from the existence of too-large sets, 
but from allowing every (large) set to be a member of some other set. 


Fig. 3.7 John von Neumann Fig. 3.8 Paul Bernays Fig. 3.9 Kurt Godel 
(Courtesy: See Preface) (Courtesy: See Preface) (Courtesy: See Preface) 


During 1925-40 they gradually defined a formal axiomatic system NBG. In ad- 
dition to the two usual basic notions (i.e., the set and the relation €), NBG in- 
troduced one more basic notion, class, which is a generalization of the notion of 
set. The advantage of this formal axiomatic system is that there are only a finite 
number of axioms (because there are no axiom schemas). The theory developed 
in this system is called von Neumann-Bernays-Gédel’s set theory. (For further 
details see Box 3.7.) 


What is the relationship between ZF and NBG? It was found that whatever can be 
proved in ZF can also be proved in NBG. The opposite holds only for the formulas 
of NBG that are also formulas of ZF. (This is because the notion of class is unknown 
to ZF.) Because of this, NBG is said to be a conservative extension of the theory ZF. 
Theorem 3.1 condenses all of this. 


Th 3.1. Fi la F of Z¥ it holds that | F iff EF. 
eorem or any formula F of it holds tha ee iff a 


!2 John von Neumann, 1903-1957, Hungarian—American mathematician. 
'3 Kurt Gédel, 1906-1978, Austrian—American logician, mathematician, and philosopher. 
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Let us note that the notion of a class is often used in ZF too, but it is not formally 
defined. For example, while an object {w | w € z A P(w)} is surely a set when z is 
a set (by the Separation Schema), an object {w | P(w)} is called a class for safety’s 
sake (as it may not exist as a set). Thus, one can talk about “the class of all sets” 
knowing that “the set of all sets” does not exist. 


Box 3.6 (ZFC, Zermelo-Fraenkel Axiomatic Set Theory). 


Formal Axiomatic System ZFC. The alphabet has symbols from L and two proper symbols: an 
individual-constant symbol @ and a binary relation symbol €. The terms and formulas are as usual. 
The language is of the first order. The rules of inference are Modus Ponens and Generalization. In 
addition to the logical axioms of L, there are nine proper axioms: 


1) VxVy(Vw(wex@wey)>x=y) (Axiom of Extensionality) 
2) VydzVw(wez<@weyAP(w)), for any P with no free z. (Separation Schema) 
3) VxVydzVw(wezew=xVw=y) (Axiom of Pair) 
4) VydzVw(wez = dx(wexAx€y)) (Axiom of Union) 
5) VuVvVw(f (u,v) Af(u,w) > v = w) > VydzVw(w € z © Sx(x € yAf(x,w))), 

for any f with no free z. (Substitution Schema) 
6) Az@EezAVxez:xU{x} ez) (Axiom of Infinity) 
7) VyszVw(wez = wCy) (Axiom of Power Set) 
8) Vy AO0sdxey:xny=90 (Axiom of Regularity) 
9) Vy(V¥xey:xA0=> Sf € (Uy) VxEey: f(x) Ex) (Axiom of Choice) 


Standard Interpretation. The domain of the standard interpretation was described by von Neu- 
mann, so we denote it by V. Von Neumann insisted that V contains exactly all the sets. Thus, if x 
is in Y, then x is a set. Now, if an element of x had not itself been a set, it would not have been 
in Y, and this would have led to trouble. To avoid this, von Neumann required that each element 
of a set be itself a set. Such sets are called hereditary. Thus, V contains exactly all the hereditary 
sets. It might appear that there are useful sets, such as {0, 1,2}, that are not hereditary. We will see 
shortly that this is not so. 


Remarks. In the following we describe, for each proper axiom, the motivation for adding it to the 
axioms of ZF, its meaning, and its consequences. When interpreting a proper axiom, bear in mind 
that the individual-variable symbols mean hereditary sets. 


1) = Axiom of Extensionality: A set is completely determined by its elements. 
Consequences: Two sets with the same elements are equal. V contains exactly hereditary 
sets. There is at most one empty set, 0. Motivation for axiom 2: Does @ exist? 


2) Separation Schema: The set z= {w|wey A P(w)} exists. 
Comment: y is any set and P is any property defined by a formula of L. Consequences: 0 is 
a set. (Proof: P(w) = =(w = w)). For any sets A and B, also ANB 2 {w|we AAwe B} 


and A\ B= {w|w € AAw ¢ B} are sets. Motivation for axiom 3: We need more sets. 


3) | Axiom of Pair: For any x and y there is a set z containing exactly x and y. 
Consequences: Ordered pair (x,y) = {{x}, {x,y} is a set. {0} is a set (as {0} = {0,0}); so 


is {{0}} (as {{0}} = {{0}, {0}}); and so on. Defining 0 = 0, 1 = {0}, and 2 = {0, {0}}, we 
obtain the numbers 0, 1,2 and count to two. Motivation for axiom 4: We cannot define larger 
numbers in this way, because we cannot construct sets with more than two members. So we 


need more sets. 
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4) Axiom of Union: For any family y of sets x there is a set z that is the union of the sets x. 
Consequences: Now we can construct sets with more than two elements and define any 
natural number, e.g., 3 = 2U {2} = {0, {0}, {0, {0} }}, using the definition n+ 1 = nU {n}. 
The definition is applicable on every n € N, where N denotes the collection of all natural 
numbers. Motivation: Is N a set? We will postulate this in the Axiom of Infinity (see below). 
However, it turns out that N alone does not allow for the development of a full theory of 
ordinals and for the use of transfinite induction. This is why we first introduce axiom 5. 


5) Substitution Schema: If the domain y of a function f is a set, then its range z is also a set. 
Comment: In the schema, f(x,y) denotes a function x + y. Consequences: There exist certain 
well-ordered sets, i.e., ordinal numbers. 

6) Axiom of Infinity: There is an inductive infinite set z. 

Comment: A set z is defined to be inductive if 0 € z\ Vx(x € z=> xU {x} € z). A set z is 
defined to be infinite if it is equipollent to a proper subset of z. Consequence: N is a set. 
Motivation for axiom 7: Some sets still cannot be constructed (e.g., the power set of a set). 


7) Axiom of Power Set: For any set y there is a set z containing all the subsets w of y as members. 
Motivation: The axioms of pair, union, and power set allow for the construction of larger sets 
from smaller ones. Thus, a set exists if it can be constructed only from @ and N, which are 
the only sets whose existence was postulated by axioms. What about “irregular” sets, such 
as Russell’s R? At this point we can still construct a set x where x € y € x for some set y. We 
should confine constructions so that only “regular” (i.e., reasonable) sets will exist. Axiom 8 
takes care of this. 


8) Axiom of Regularity: Any nonempty set y contains an element x such that x and y have no 
common elements. 
Consequence: There can be no set x such that x € y € x for some set y (else, we would have 
x € {x,y}My and y € {x,y} Mx, implying that {x,y} would not contain an element sharing 
no elements with {x,y}, in contradiction with axiom 8). This prevents Russell’s Paradox. 


9) Axiom of Choice: For any family y of nonempty sets x there is a function f that assigns to 
each member x of y a member of x. 
Comment: (Uy)” denotes the set of all functions from y to the union Uy of all elements of y. 


Box 3.7 (NBG, von Neumann-Bernays-Godel’s Axiomatic Set Theory). 


Basic Ideas. In addition to the two usual basic notions of a set and a membership relation €, there 
is also the notion of a class. Each set is also a class, but some classes are not sets. Classes that are 
not sets are called proper classes. A characteristic of a proper class is that it is not a member of 
any class (and hence, of any set). The intention of such a definition of a class is now clear: proper 
classes should represent collections that are too large to be sets, and non-proper classes (that is, 
sets) should represent all the reasonably large sets that are used in mathematics. 

Drawing a distinction between sets and proper classes enables us to prevent paradoxes. Let us 
see how this works on Russell’s Paradox. Define the class R = {S | Sisaset \ S ¢ S}. Like 
every class, 7e either is or is not a member of itself. Let us see whether 7?, even as a class, gives 
rise to Russell’s Paradox RERSRER. If RER, then R is a set that is a member of itself, 
and hence R ¢ 7. This is a contradiction. Assume now that 7 ¢ R. This means that 7 is not a set 
or RER. The latter alternative is impossible because of the assumption, which leaves us the first 
alternative: 7e is not a set. Therefore, 7 is a proper class. We have seen that this deduction, which 
in Cantor’s naive set theory led to Russell’s Paradox, now luckily ends up with the conclusion that 
R is a proper class. In a similar way Buralli-Forti’s and Cantor’s paradoxes are eliminated. So how 
was this system formally defined? 
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Formal Axiomatic System NBG. There are many similarities with ZF. The alphabet has all the 
symbols of L and two proper constant symbols @ and €. The terms and formulas are built as 
usual. The symbolic language is of the first order. The rules of inference are Modus Ponens and 
Generalization. In addition to the logical axioms inherited from L, there are proper axioms. How 
were these selected? 

Recall that Cantor’s Axiom of Abstraction postulated that “Every property P defines a set Sp.” 
As we have seen, the authors of ZF limited the properties P so that Sp are reasonable and not too 
large. In contrast, the authors of NBG argued as follows: 


If we demanded that Sp be a class, then Sp might not be a set (but be a proper class). Hence, 
the fear of too-large sets might become superfluous. But then, could P again be an arbitrary 
property? Could we declare the following generalization of the Axiom of Abstraction: “Ev- 
ery property P defines a class Sp”? 


It turned out that such an axiom is bad, for it would allow R’ = {S| S isaclass \ S ¢ S} to 
be a class, and this class would again lead to Russell’s Paradox R’ € R’ = R’ ¢R’. Thus, more 
caution was needed in order to generalize the Axiom of Abstraction. The result of the search is 
the following Axiom of Class Existence: Every property P of sets defines a class. Hence, a class 
cannot be determined by a property of proper classes, but only by a property of sets. This finally 
leads to an informal definition of a class: 


A class is a collection of sets that have in common a property P: {S | S isa set \ P(S)}. 


Of course, the sets S must exist in the first place. This is ensured by other axioms (as in ZF). For 
this NBG has three groups of proper axioms. 

The first group initially consisted only of the Axiom of Class Existence to establish the notion 
of a class. This axiom is actually an axiom schema, because it represents an infinite number of 
axioms, one for each property P of sets. It turned out that the schema can be replaced by only eight 
axioms! These axioms now constitute the first group. In the second group is the following Axiom 
of Extensionality: Two classes are equal if they have the same elements. The third group consists 
of axioms that, similarly to ZF, postulate the existence of sets obtained either ex nihilo (such as @ 
and N) or by a construction from existing ones. 


Second-Order Formal Axiomatic Systems and Theories 


It turned out that certain properties of mathematical objects cannot be defined in 
a first-order symbolic language. So, the basic notions and axioms referring to such 
properties cannot be stated in these languages. Consequently, there are no first-order 
theories about such objects. In such cases, it often turns out that the quantifiers V 
and 4 should be applicable to function-variable symbols and/or predicate-variable 
symbols, something that is not allowed in first-order symbolic languages. For in- 
stance, first-order languages do not enable us to define the completeness of the set 
R of real numbers, or the concepts of torsion group and mathematical induction 
(see Box 3.8 for further details). If, however, the action of quantifiers is expanded to 
function-variable or predicate-variable symbols, we obtain a second-order symbolic 
language, a second-order formal axiomatic system, and a second-order theory. 
Unfortunately, second-order theories are not as useful as they seem. This is be- 
cause they lack some important properties that are characteristic of first-order the- 
ories. For example, for any theory it is important to know whether the theory has 
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models, how many there are, what their properties are and what the relations be- 
tween them are. Such questions are dealt with in model theory. Two of its important 
theorems are the Compactness Theorem" and the “downward” Léwenheim-Skolem 
Theorem}. But, in general, the theorems do not hold in second-order theories. Thus, 
second-order languages and theories may be more powerful in their expression, but 
they are less amenable to a metamathematical treatment. 

As we will see, the deficiencies of second-order theories had no opportunity to 
manifest themselves and influence formalism, because it was not long before a dis- 
appointment about first-order theories came as a result of Gédel’s Incompleteness 
Theorems. 


Box 3.8 (Expression of First-Order Languages). 


We give examples of where a second-order language is needed. 


e Mathematical Induction. The Axiom of Mathematical Induction is usually written as the first- 
order defining formula F(0) A Vi(F (i) > F(i+1)) = > VnF(n), to which an explanation is 
added stating that F (x) can be any formula with x as a free variable; see Box 3.5. (The formula 
F(x) describes a property of x.) But observe that the defining formula is actually an axiom 
schema; only after F has been substituted with an actual formula is a particular axiom (i.e., 
one of infinitely many) obtained. In order to write in a symbolic language that the principle of 
mathematical induction holds for any formula F, we have to add VF to the defining formula. 
This gives us a second-order formula VF (F (0) AVi(F (i) => F (i+ 1)) => VnF (n)). First-order 
symbolic language is too weak to describe completely the principle of mathematical induction. 


e Completeness of R. A fundamental property of the set R of real numbers is completeness: every 
nonempty subset of R that is bounded above has a least upper bound. How can we express this 
in a symbolic language? Let us start generally. Let IR be the domain of an interpretation, and 
B an arbitrary property of the sets S C R. We can describe the fact that B holds for every set 
S CR by the formula VS : B(S). But this formula does not belong to any first- or second- 
order language, because V refers to sets of elements of the domain. By viewing subsets of R as 
sets Sp = {x € R|P(x)}, where P are predicates, the formula transforms into the second-order 
formula VP : B(Sp), because V now binds the predicate-variable P. If we fix the property B to 
B(Sp) = “if Sp is bounded above then it has an 1.u.b.,” we obtain the second-order formula 
VP(AbVx(P(x) > x <b) => AMu(Vx(P(x) > x <u) = € <u)) stating that R is complete. 
That is: For every P, if Sp is bounded above by b, then Sp has an |.u.b. (an @ which is < than 
any upper bound u of Sp). 


e Torsion Groups. Let (G,-) be a group with unit e. We say that G is a torsion group if for every 
a € G there is an n > 1 such that a” = e. How can we define the property P(G) = “G is a 
torsion group”? Let the domain of interpretation be (G,-). Let us try the seemingly obvious: 
P(G) = Va € Gan > 1a" =e. Notice that the interpretation of this formula is a proposition 
that also considers natural numbers—but these are not in the domain of interpretation. To fix 
that we might define P(G) =VaeG (a=e V aa=e V a-a-a=e V ...) and thus avoid 


'4 Compactness Theorem: A first-order theory has a model if every finite part of the theory does. 
'S Léwenheim-Skolem Theorem: If a theory has a model, then it has a countable model. The gener- 
alization is called the “upward” Léwenheim-Skolem Theorem and states: If a first-order theory has 
an infinite model, then forevery infinite cardinal k ithas a model of size kK. Since such a theory is 
unable to pin down the cardinality of its infinite models, it cannot have exactly one infinite model 
(up to isomorphism). 
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mentioning natural numbers. However, this formula is no longer finite and, hence, not in a 
first-order symbolic language. In any case, it turns out that there is no finite set of first-order 
formulas whose models are precisely the torsion groups. 
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A formal axiomatic system is determined by a symbolic language, a set of axioms, 
and a set of rules of inference. The axioms are logical or proper. All of this is the 
initial theory associated with the formal axiomatic system. The theory is then sys- 
tematically extended into a larger theory. The development is formal: new notions 
must be defined from existing ones, and propositions must be formally proved, 1.e., 
derived from axioms or previously proved formulas. Each proved formula is a the- 
orem of the theory. Such a syntax-oriented and rigorous development of the theory 
is a mechanical process and because of this better protected from man’s fallibility. 

The notion of truth is so fundamental that philosophers have been trying to cap- 
ture it in a satisfying definition since Aristotle. Even today, there are debates about 
the possibility of achieving such a definition for natural language. However, Tarski 
demonstrated that this is possible for suitably formalized fragments of natural lan- 
guage. For pragmatic reasons we usually choose such fragments of natural language 
that allow scientific discourse. 

At any stage of the development, the theory can be interpreted in a chosen field 
of interest, called the domain of interpretation. The interpretation defines how a 
formula must be understood as a statement about the elements of the domain. Each 
interpretation of a theory under which all the axioms of the theory are valid is called 
a model of the theory. A theory may have several different models. A model is not 
necessarily a part of the real world. 

Formal axiomatic systems both protected the development of theories from man’s 
fallibility and preserved the freedom given by the hypothetical axiomatic system. 
The three particular fields of mathematics whose formal axiomatic systems and 
theories played a crucial role in the events that followed are logic, arithmetic, and 
Cantor’s set theory. The corresponding formal axiomatic systems are First-Order 
Logic L, Formal Arithmetic A, and the two Axiomatic Set Theories ZF and NBG. 


ye 
Chapter 4 Apdates 
Hilbert’s Attempt at Recovery 


If something is consistent, no part of it contradicts or conflicts 
with any other part. If something is complete, it contains all the 
parts that it should contain. If something is decidable, we can 
establish the fact of the matter after considering the facts. 


Abstract Hilbert’s Program was a promising formalistic attempt to recover mathe- 
matics. It would use formal axiomatic systems to put mathematics on a sound foot- 
ing and eliminate all the paradoxes. Unfortunately, the program was severely shaken 
by Gédel’s astonishing and far-reaching discoveries about the general properties of 
formal axiomatic systems and their theories. Thus Hilbert’s attempt fell short of 
formalists’ expectations. Nevertheless, although shattered, the program left open an 
important question about the existence of a certain algorithm—a question that was 
to lead to the birth of Computability Theory. 


4.1 Hilbert’s Program 


In this section we will describe Hilbert’s Program. In order to understand the goals 
of the program, we will first define the fundamental metamathematical problems of 
formal axiomatic systems and their theories. Then we will describe the goals of the 
program and Hilbert’s intentions that influenced the program. 


4.1.1 Fundamental Problems of the Foundations of Mathematics 


The rigor and syntactic orientation of formal axiomatic systems not only protected 
them from man’s fallibility, but also enabled a precise definition and investigation of 
various metamathematical problems, i.e., questions about their theories. Naturally, 
these questions were closely linked to the burning question of protecting mathemat- 
ics from paradoxes. They are called the problems of the foundations of mathematics. 
Of special importance to the history of the notion of algorithm and Computability 
Theory will be the following four problems of the foundations of mathematics: 
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1. Consistency Problem. Let F be a first-order theory. (So we can use the logical 
connective —.) Suppose that there is a closed formula F in F such that both F 
and —F are derivable in F. Then the contradictory formula F \ —F immediately 
follows! We say that such a theory is inconsistent. But we can readily show that 
in an inconsistent theory any formula of the theory can be derived! (See Box 4.1.) 


Box 4.1 (Derivation in an Inconsistent Theory). 


Suppose that formulas F and —F are derivable in F and let A be an arbitrary formula in F. Then 
we have the following derivation of A: 


io, Supposition. 

2. SF Supposition. 

3. F=> (A> F) Ax. | of L (Box 3.4) with A instead of G (i.e. A/G) 
4. -A>F From 1. and 3. by MP. 

5. AF > (7-A=> -F) Ax. | of L (Box 3.4) with =F/F and A/G. 

6. -A>—F From 2. and 5. by MP. 

7. (7A F) => ((sA=>F)=> A) Ax.3 of L (Box 3.4) with A/G. 

8.(AAS>F)SA From 6. and 7. by MP. 

9. A From 4. and 8. by MP. 


An inconsistent theory has no cognitive value. This is why we seek consistent 
theories (Fig. 4.1). So the following metamathematical question is important: 


Consistency Problem: “Is a theory F consistent?” 


For example, in 1921, Post proved that the Propositional Calculus P is consistent. 


Fig. 4.1 In a consistent theory Never: 
F for no formula F both F and 
—F are derivable in F 


2. Syntactic Completeness Problem. Let F be a consistent first-order theory and 
F an arbitrary closed formula of F. Since F is consistent, F and —F are not both 
derivable in F. But, what if neither F nor —F is derivable in F? In such a case we 
say that F is independent of F (as it is neither provable nor refutable in F). This 
situation is undesirable. We prefer that at Jeast one of F and —F be derivable in 
F. When this is the case for every closed formula of F, we say that F is syntac- 
tically complete (see Fig. 4.2). Thus, in a consistent and syntactically complete 
theory every closed formula is either provable or refutable. Informally, there are 
no “holes” in such a theory, i-e., no closed formulas independent of the theory. 
So, the next metamathematical question is important: 
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Syntactic Completeness Problem: “Is a theory F syntactically complete?” 


The answer tells us whether F guarantees that, for no formula of F, the search 
for either a proof or a refutation of the formula is a priori doomed to fail. For 
instance, we know that the Propositional Calculus P and First-Order Logic L are 
not syntactically complete. 


Fig. 4.2 In a syntactically 
complete theory F it holds, 
for every formula F, that F or 
=F is derivable in F 


3. Decidability Problem. Let F be a consistent and syntactically complete first- 
order theory, and F an arbitrary formula of F. The derivation of F or —F may 
be highly intricate. Consequently, the search for a formal proof or refutation of 
F is inevitably dependent on our ingenuity. As long as F is neither proved nor 
refuted, we can be sure that this is because of our lack of ingenuity (because F is 
syntactically complete). Now suppose that there existed an algorithm—called a 
decision procedure—capable of answering—in finite time, and for any formula 
F of F—the question “Is F derivable in F?” Such a decision procedure would be 
considered effective because it could decide, for any formula, whether or not it is 
a theorem of F. When such a decision procedure exists, we say that the theory F 
is decidable. (See Fig. 4.3.) So, the metamathematical question, called the 


Decidability Problem: “Is a theory F decidable?” 


is important because the answer tells us whether F allows for a systematic (i.e., 
mechanical, algorithmic) and effective search of formal proofs. In a decidable 
theory we can, at least in principle, develop the theory without investing our 
ingenuity and creativity. For instance, the Propositional Calculus P is known to 
be a decidable theory; the corresponding decision procedure uses the well-known 
truth-tables and was discovered in 1921 by Post.! 


Fig. 4.3 In a decidable theory = ae 
F there is a decision procedure Decision procedure for 
(algorithm) that tells, for "Ts F derivable in F?" 


arbitrary formula F, whether 
or not F is derivable in F 


' Emil Leon Post, 1897-1954, American mathematician, born in Poland. 
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4. Semantic Completeness Problem. Let F be a consistent first-order theory. When 
interpreting F, we are particularly interested in the formulas that are valid in 
the theory F, i.e., formulas that are valid in every model of F. Such formulas 
represent Zruths in F (see p. 44). Now, we know that all the axioms of F, both 
logical and proper, represent 7ruths in F. If, in addition, the rules of inference of 
F preserve the property “to represent a Zruth in F”, then also every theorem of F 
represents a Zruth in F. When this is the case, we say that F is sound. Informally, 
in a sound theory we cannot deduce something that is not a Zruth, so the theory 
may have cognitive value. Specifically, it can be proved that Modus Ponens and 
Generalization preserve the -Zruth-ness of formulas. So we can assume that the 
theories we are interested in are sound. 

To summarize, a theory F is sound when the following holds: If a formula F is 
a theorem of F, then F represents a Zruth in F; in short 


If FE F then |-F. (F is sound) 
F 


However, the opposite may not hold: A sound theory F may contain a formula 
that represents a ruth in F, yet the formula is not derivable in F. This situation 
can arise when F lacks some axiom(s). 

Of course, we would prefer a sound theory whose axioms suffice for deriving 
every Zruth-representing formula in the theory. When this is the case, the theory 
is said to be semantically complete. Thus, a theory F is semantically complete 
when the following holds: A formula F is a theorem of F if and only if F repre- 
sents a .Zruth in F; in short 


LF if and only if =F. (F is semantically complete) 
F 


The metamathematical question 
Semantic Completeness Problem: “Is a theory F semantically complete?” 


is of the greatest importance because the answer tells us whether the syntactic 
property “to be a theorem of F” coincides with the semantic property “to repre- 
sent a Zruth in F” (see Fig. 4.4). That Propositional Calculus P and First-Order 
Logic L are semantically complete theories was proved by Post (1921) and Gédel 
(1930), respectively. (The latter is known as Gédel’s Completeness Theorem.) 


NB [f there is an interpretation (1,7) of F and a formula GEF such that G is 
valid under (1,.%) but not a theorem of F, then F is not semantically complete. 
This will happen in Sect. 4.2.3 in the standard model of A, (N,=,+,*,0, 1). 


o. all models 


Fig. 4.4 In a semantically 
complete theory F a formula 

F is derivable iff F is valid in F 
(i.e., valid in every model of F) 
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4.1.2 Hilbert’s Program 


Let us now return to the foundational crisis of mathematics at the beginning of 
the twentieth century. During 1920-28, Hilbert gradually formed a list of goals— 
called Hilbert’s program—that should be attained in order to base the whole of 
mathematics on new foundations that would prevent paradoxes. 


Fig. 4.5 David Hilbert 
(Courtesy: See Preface) 


Hilbert’s Program consisted of the following goals: 


A. find an f.a.s. M having a computable set of axioms and capable of deriving 
all the theorems of mathematics; 

B. prove that the theory M is semantically complete; 

C. prove that the theory M is consistent; 

D. construct an algorithm that is a decision procedure for the theory M. 


Note that the goal D asks for a constructive proof that M is decidable, i.e., a proof 
by exhibiting a decision procedure for M. Let us denote this procedure by Densch 
since Hilbert called the Decidability Problem for M the Entscheidungsproblem. 


Intention 


Hilbert’s intention was that, having attained the goals A, B, C, D, every mathemat- 
ical statement would be mechanically verifiable. How could they be verified? We 
should first write the statement as a sentence, i.e., a closed formula F of M (hence 
goal A). How would we find out whether the statement represented by F is a mathe- 
matical ruth? If M were a semantically complete theory (hence goal B), we would 
be sure that F represents a Zruth in M iff F is a theorem of M. Therefore, we could 
focus on syntactic issues only. If M were a consistent theory (hence goal C), the for- 
mulas F and —F could not both be theorems. Finally, we would apply the decision 
procedure Denrscn (hence goal D) to find out which of F and —F is a theorem of M. 


NB Hilbert expected that M would be syntactically complete. 
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What is more, by using the decision procedure Dgpis:n, mathematical statements 
could be algorithmically, that is, mechanically classified into Yruths and non- 
Truths. There would be no need for human ingenuity in mathematical research; 
one would just systematically generate mathematical statements (i.e., sentences), 
check them by Degpjs-n, and collect only those that are Truths 2 


Finitism 


Hilbert expected that the consistency and semantic completeness of M could be 
proved only by analyzing the syntactic properties of M and its formulas. To avoid 
deceptive intuition, he defined the kind of reasoning, called finitism, one should 
preferably use in such an analysis. Here, Hilbert approached the intuitionist view of 
infinity. In particular, proofs of the goals B and C should be finitist in the sense that 
they should use finite objects and methods that are constructive, at least in principle. 
For instance, the analysis should avoid actual infinite sets, the use of the Law of 
Excluded Middle in certain existence proofs, and the use of transfinite induction. 
(We will informally describe what transfinite induction is in Box 4.7 on p. 68.) 


4.2 The Fate of Hilbert’s Program 


After Hilbert proposed his program, researchers started investigating how to attain 
the goals A, B, C, and D. While the research into the formalization of mathematics 
(goal A) and the decidability of mathematics (goal D) seemed to be promising, it 
took only a few years before Gédel discovered astonishing and far-reaching facts 
about the semantic completeness (goal B) and consistency (goal C) of formally de- 
veloped mathematics. In this section we will give a detailed explanation of how this 
happened. 


4.2.1 Formalization of Mathematics: Formal Axiomatic System M 


So, what should the sought-for formal axiomatic system M look like? Preferably it 
would be a first-order or, if necessary, second-order formal axiomatic system. Pro- 
bably it would contain one of the formal axiomatic systems ZFC or NBG in order to 
introduce sets. Perhaps it would additionally contain some other formal axiomatic 
systems that formalize other fields of mathematics (analysis, for example). Despite 
these open questions, it was widely believed that M should inevitably contain the 
following two formal axiomatic systems (see Fig. 4.6): 


? Today, one would use a computer to perform these tasks. Of course, when Hilbert proposed his 
program, there were no such devices, so everything was a burden on the human processor. 
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1. First-Order Logic L. This would bring to M all the tools needed for the logically 
unassailable development of the theory M, that is, all mathematics. The trust in 
L was complete after the consistency of L had been proved with finitist meth- 
ods, and after Gddel and Herbrand had proved the semantic completeness of L 
—Herbrand even with finitist methods. 


2. Formal Arithmetic A. This would bring natural numbers to M. Since natural num- 
bers play a key role in the construction of other kinds of numbers (e.g., rational, 
irrational, real, complex), they are indispensable in M. 


: unprovable. formulas: 


in ar 


theorems 


mathematics = M. 


Fig. 4.6 Mathematics as a 
theory M belonging to the 
formal axiomatic system M 


4.2.2 Decidability of M: Entscheidungsproblem 


Recall that the goal of the Entscheidungsproblem was: Construct an algorithm 
Denisch that will, for any formula F of M, decide whether F is derivable in M; in 
short, whether LF. (See Fig. 4.7). 


Hopes that there was such a Deniscn Were raised by the syntactic orientation of 
formal axiomatic systems and their theories, and, specifically, by their view of a 
derivation (formal proof) as a finite sequence of language constructs built according 
to a finite number of syntactic rules. At first sight, the search for a derivation of a 
formula F could proceed, at least in principle, as follows: 


systematically generate finite sequences of symbols of M, and 
for each newly generated sequence 
check whether the sequence is a proof of F in M; 
if so, then answer YES and halt. 


If, in truth, F were derivable in M, and if a few reasonable assumptions held (see 
Box 4.2), then the procedure would find a formal proof of F. However, if in truth 
F were not derivable in M, the procedure would never halt, because it would keep 
generating and checking candidate sequences. But notice that if a newly generated 
sequence is not a proof of F, it may still be a proof of —F. So we check this possibility 
too. We obtain the following improved procedure: 
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systematically generate finite sequences of symbols of M, and 
for each newly generated sequence 
check whether the sequence is a proof of F in M; 
if so, then answer YES and halt 
else check whether the sequence is a proof of —F in M; 
if so, then answer NO and halt. 


Assuming that either F or —F is provable in M, the procedure always halts. In 
Hilbert’s time there was wide belief that M would be syntactically complete. 


Is formula F derivable in M? 


v 


algorithm Dy ich 


Fig. 4.7 Denisch answers in 


finite time with YES or NO the 
question “Js F a theorem of @ > processor Ee 
M?” 


Box 4.2 (Recognition of Derivations). 


Is a sequence of symbols a derivation of F in M? Since the sequence is finite, it can only be 
composed of finitely many formulas of M. There are also finitely many rules of inference in M that 
can connect these formulas in a syntactically correct derivation of F. Assuming that we can find 
out in finite time whether a formula (contained in the sequence) directly follows from a finite set 
of premises (contained in the sequence) by a rule of inference of M, we can decide in finite time 
whether the sequence is a derivation of F in M. To do this, we must systematically check a finite 
number of possible triplets (formula, set of premises, rule). 

Here, an assumption is needed. Since a premise can also be an axiom of M, we must assume that 
there is a procedure capable of deciding in finite time whether a formula is an axiom of M. Today, 
we say that such a set of axioms is computable (and the corresponding theory M is computably 
axiomatizable). Hence goal A of Hilbert’s Program. 


If the theory M were consistent, then at most one of F and —F would be derivable 
in M. If, in addition, M were syntactically complete, then at least one of F and —F 
would be derivable in M. Consequently, for an arbitrary F of such an M, the pro- 
cedure would halt in a finite time and answer either YES (i.e., F is a theorem of M) 
or NO (i.e., =F is a theorem of M). So, M would be decidable and the above pro- 
cedure would be the decision procedure Degnjs-,. Thus we discovered the following 
relationship: 


if M is consistent and M is syntactically complete 
then there is a decision procedure for M, i.e., M is decidable. 
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NB These questions caused the birth of Computability Theory, which took over 
the research connected with goal D of Hilbert’s Program, and finally provided 
answers to these and many other questions. 


We will return to questions about the decision procedure in the next chapter. 
Later we will describe how the new theory solved the Entscheidungsproblem (see 
Theorem 9.2 on p. 217). For the present, we continue describing what happened to 
the other two all-important goals (B and C) of Hilbert’s Program. 


4.2.3 Completeness of M: Gédel’s First Incompleteness Theorem 


So, how successful was proving the semantic completeness of M (goal B)? In 1931, 
hopes of finding such a proof were dashed by 25-year-old Godel. He proved the 
following metamathematical theorem. 


Fig. 4.8 Kurt Godel 
(Courtesy: See Preface) 


Theorem 4.1. (First Incompleteness Theorem) /f the Formal Arithmetic A is 
consistent, then it is semantically incomplete. 


Informally, the First Incompleteness Theorem tells us that if A is a consistent 
theory, then it is not capable of proving all Yruths about natural numbers; there are 
statements about natural numbers that are true, but are unprovable within A. 

Gédel proved this theorem by constructing an independent formula G in A, 1.e., a 
formula G such that neither G nor 4G is derivable in A. In addition, he proved that G 
represents a ruth about natural numbers, i.e., that the interpretation of G is true in 
the standard model (N,=,+,*,0,1) of A. (This is the place to recall NB on p. 58.) 

What is more, he proved that even if G were added to the proper axioms of A, 
the theory A’ belonging to the extended formal axiomatic system would still be 
semantically incomplete—now because of some other formula G’ independent of 
A’ yet true in the standard model. Finally, he proved the following generalization: 
Any consistent extension of the set of axioms of A gives a semantically incomplete 
theory. (For a more detailed explanation of Gédel’s proof, see Box 4.4 on p. 65.) 
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Informally, the generalization tells us that no consistent theory that includes A 
is capable of proving all Zruths about the natural numbers; there will always be 
statements about the natural numbers that are true, yet unprovable within that theory. 


4.2.4 Consequences of the First Incompleteness Theorem 


Gédel’s discovery revealed unexpected limitations of the axiomatic method and se- 
riously undermined Hilbert’s Program. Since M was supposed to be a consistent 
extension of A, it would inevitably be semantically incomplete! This means that the 
mathematics developed as a formal theory M would be like a “Swiss cheese full 
of holes” (Fig. 4.9), with some of the mathematical Yruths dwelling in the holes, 
inaccessible to usual mathematical reasoning (i.e., logical deduction in M). 


‘ unprovable formulas: 
; oe eG 
! theorems 


mathematics = M 


Fig. 4.9 Mathematics devel- O 


oped in the formal axiomatic 
system M would not be se- 
mantically complete 


The independent formulas G, G’,... of the proof of the First Incompleteness Theo- 
rem are constructed in such a shrewd way that they express their own undecidability 
and, at the same time, represent Zruths in M (see NB on p. 58 and Box 4.4). 

But in other holes of M sit independent formulas that tell us nothing about their 
own validity in M. When such a formula is brought to light and its independence 
of the theory is uncovered, we may declare that either the formula or its negation 
represents a Zruth in M (mathematics). In either case our choice does not affect the 
consistency of the theory M. However, the choice may not be easy, because either 
choice may have a model that behaves as a possible part of mathematics. An ex- 
ample of this situation is the formula that represents (i.e., describes) the Continuum 
Hypothesis, which we encountered on p. 16. (See more about this in Box 4.3.) 


Remark. Nevertheless, all of this still does not mean that such Zruths will never be recognized. 
Note that the First Incompleteness Theorem only says that for recognizing such Zruths the axi- 
omatic method is too weak. So there still remains a possibility that such Zruths will be proven 
(recognized) with some other methods surpassing the axiomatic method in its proving capability. 
Such methods might use non-finitist tools or any other tools yet to be discovered. 
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Box 4.3 (Undecidability of the Continuum Hypothesis). 


Intuitively, the Continuum Hypothesis (CH) conjectures that there is no set with more elements 
than natural numbers and fewer than real numbers. In short, there is no cardinal between No and c. 

In 1940, Gédel proved the following metamathematical theorem: Jf ZFC is consistent, then CH 
cannot be refuted in ZFC. Then, in 1963, Cohen? proved: If ZFC is consistent, then CH cannot be 
proved in ZFC. Thus, CH is independent of ZFC. Gédel’s and Cohen’s proofs show that neither 
CH nor =CH is a Zruth in ZFC. Discussions about whether or not to add CH to ZFC (and hence 
mathematics) still continue. 

There is a similar situation with the generalization of CH. Let @ be any ordinal and Xq 
and Nqa+1 the cardinalities of a set and its power set, respectively. The Generalized Continuum 
Hypothesis (GCH) conjectures: There is no cardinal between Ng and Ng+1, ie., 2Ra = Rost: 


Box 4.4 (Proof of the First Incompleteness Theorem). 


The theorem states: Every axiomatizable consistent theory that includes A is incomplete. That is: 
Every axiomatizable theory that is consistent and sufficiently strong has countably infinitely many 
formulas that are true statements about natural numbers but are not derivable in the theory. 

How did Gédel prove this? His first breakthrough was the idea to transform metamathematical 
statements about the theory A into formulas of the very theory A. In this way, each statement 
about A would become a formula in A, and therefore accessible to a formal treatment within A. In 
particular, the metamathematical statement ¥ saying that “a given formula of A is not provable in 
A” would be transformed into a formula of A. Gédel’s second breakthrough was the construction 
of this formula and the use of it in proving its own undecidability. The main steps of the proof are: 


1. Arithmetization of A. A syntactic object of A is a symbol, a term, a formula, or any finite 
sequence of formulas (e.g., a formal proof). 

First, Gédel showed that with every syntactic object X one can associate a precisely defined 
natural number y(X)—today called the Gédel number of X. Different syntactic objects have 
different Gddel numbers, but there are natural numbers that are not Gddel numbers. The com- 
putation of (X) is straightforward. Also straightforward is testing to see whether a number is 
a Godel number and, if so, constructing the syntactic object from it. (See Problems on p. 71.) 

Second, Gédel showed that with every syntactic relation (defined on the set of all syntactic 
objects) there is associated a precisely defined numerical relation (defined on N). In particular, 
with the syntactic relation D(X, Y) = “X is a derivation of formula Y” is associated a numerical 
relation D C N? such that D(X, Y) iff D(7(X), y(¥)). All this enabled Gédel to describe A only 


with natural numbers and numerical relations. We say that he arithmetized* A. 


2. Arithmetization of Metatheory A. Godel then arithmetized the metatheory A, i.e., the theory 
about A. A metatheoretical proposition in A is a statement ¥ that (in natural language and 
using special symbols like + and =) states something about the syntactic objects and syntactic 
relations of A. Since Gédel was already able to substitute these with natural numbers and nu- 
merical relations, he could translate Y into a statement referring only to natural numbers and 
numerical relations. But notice that such a statement belongs to the theory A and is, therefore, 
representable by a formula of A! We see that Gédel was now able to transform metatheoretical 
statements of A into formulas of A. 


3 Paul Cohen, 1934-2007, American mathematician. 
4 Recall (p. 6) that Leibniz had a similar idea, though he aimed to arithmetize man’s reflection. 
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3. Gédel’s Formula. Now Gédel could do the following: 1) he could transform any metamath- 
ematical statement .Y about formulas of A into a formula F about natural numbers; 2) since 
natural numbers can represent syntactic objects, he could interpret F as a statement about syn- 
tactic objects; and 3) since formulas themselves are syntactic objects, he could make it so that 
some formula is a statement about itself. How did he do that? 

Let Y(H) be a metamathematical statement defined by Y(H) = “Formula H is not provable 
in A.” To Y(H) corresponds a formula in A; let us denote it by G(h), where h = y(H). The 
formula G(/) states the same as Y(H), but in number-theoretic vocabulary. Specifically, G(h) 
states: “The formula with Godel number h is not provable in A.” 

Now, also G(h) is a formula, so it has a Godel number, say g. What happens if we take 
h:= g in G(h)? The result is G(g), a formula that asserts about itself that it is not provable in 
A. Clever, huh? To improve readability, we will from now on write G instead of G(g). 


4. Incompleteness of A. Then, Godel proved: G is provable in A iff aG is provable in A. (In the 
proof of this he used so-called -consistency; later, Rosser showed that usual consistency suf- 
fices.) Now suppose that A is consistent. So, G and 4G are not both provable. Then it remains 
that neither is provable. Hence, A is syntactically incomplete. Because there is no proof of G, 
we see that what G asserts is in fact true. Thus, G is true in the standard model (N, =, +, *,0, 1) 
of A, yet it is not provable in A. In other words, A is semantically incomplete (see NB on p. 58). 


5. Incompleteness of Axiomatic Extensions of A. The situation is even worse. Because G represents 
a Zruth about natural numbers, it seems reasonable to admit it to the set of axioms of A, hoping 
that the extended formal axiomatic system will result in a better theory A"!), Indeed, AC” is 
consistent (assuming A is). But again, there is a formula G6!) of A) (not equivalent to G) that 
is independent of A!), So, A“) is syntactically incomplete. What is more, G!) is true in the 
standard model (N, =, +, *,0, 1). Hence, A()) is semantically incomplete. 

If we insist and add G6“) to the axioms of A!), we get a consistent yet semantically incom- 
plete theory A) containing an independent formula G) (not equivalent to any of G,G“')) that 
is true in (N, =,+,*,0, 1). Gédel proved that we can continue in this way indefinitely, but each 
extension A“ will yield a consistent and semantically incomplete theory (because of some 
formula G6“), which is not equivalent to any of the formulas G, gl), sek GD), 


4.2.5 Consistency of M: Gédel’s Second Incompleteness Theorem 


What about the consistency of the would-be theory M (goal C)? Hilbert believed 
that it would suffice to prove the consistency of Formal Arithmetic A only. Then, 
the consistency of other formalized fields of mathematics (due to their construc- 
tion from A) and, finally, the consistency of all formalized mathematics M would 
follow. Thus, the proof of the consistency of M would be relative to A. But this 
also means that, eventually, the consistency of A should be proved with its own 
means and within A alone. In other words, the proof should be constructed without 
the use of other fields of M—except for the First-Order Logic L—because, at that 
time, their consistency (being relative) would not be established beyond any doubt. 
Formal Arithmetic A should demonstrate its own consistency! We say that the proof 
of the consistency of A should be absolute. A method that tried to prove the consis- 
tency of A is described in Box 4.5. 
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Box 4.5 (Absolute Proof of Consistency). 


We have seen in Box 4.1 (p. 56) that for any first-order theory F the following holds: Jf there is a 
formula F such that both F and -F are provable in ¥, then arbitrary formula A of F is provable in 
F. In an inconsistent system everything is provable. Now we see: The consistency of F would be 
proved if we found a formula B of F that is not provable in F. 

The question now is, how do we find such a formula B? We can use the following method. Let 
P be any property of the formulas of F such that 1) P is shared by all the axioms of F, and 2) P 
is preserved by the rules of inference of F (i.e., if P holds for the premises of a rule, it does so for 
the conclusion). Obviously, theorems of F have the property P. Now, if we find in F a formula B 
that does not have the property P, then B is not a theorem of F and, consequently, F is consistent. 
Obviously, we must construct such a property P that will facilitate the search for B. 

Using this method the consistency of Propositional Calculus P was proved as well as the con- 
sistency of Presburger Arithmetic (i.e., arithmetic where addition is the only operation). 


But in 1931 Gédel also buried hopes that an absolute proof of the consistency of 
A would be found. He proved: 


Theorem 4.2. (Second Incompleteness Theorem) [f the Formal Arithmetic A is 
consistent, then this cannot be proved in A. 


The proof of the theorem is described in Box 4.6. 


In other words, A cannot demonstrate its own consistency. 


Box 4.6 (Proof of the Second Incompleteness Theorem). 


The theorem says: /f A is consistent, then we cannot prove this using only the means of A. 

In proving this theorem Gédel used parts of the proof of his first theorem. Let @ be the fol- 
lowing metamathematical statement: @ = “A is consistent.” This statement too is associated with 
a formula of A—denote it by C—that says the same thing as @, but by using number-theoretic 
vocabulary. Gédel then proved that the formula C = G is provable in A. Now, if C were provable in 
A, then (by Modus Ponens) also G would be provable in A. But in his First Incompleteness Theorem 
Gédel proved that G is not provable in A (assuming A is consistent). Hence, also C is not provable 
in A (if A is consistent). 


4.2.6 Consequences of the Second Incompleteness Theorem 


Gédel’s discovery revealed that proving the consistency of the Formal Arithmetic A 
would require means that are more complex—and therefore less transparent—than 
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those available in A. Of course, a less transparent object or tool may also be more 
disputable and more controversial, at least in view of Hilbert’s finitist recommenda- 
tions. 

In any case, in 1936, Gentzen> proved the consistency of A by using transfinite 
induction in addition to usual finitist tools. (See the description of transfinite induc- 
tion in Box 4.7.) Following Gentzen, several other non-finitist consistency proofs of 
A were found. Finally, the belief was accepted that arithmetic A is in fact consistent, 
and that Hilbert’s finitist methods may sometimes be too strict. 

Did these non-finitist consistency proofs of A enable researchers to prove (rela- 
tive to A, as expected by Hilbert) the consistency of other formalized fields of math- 
ematics and, ultimately, of all mathematics M? Unfortunately, no. Namely, there is 
a generalization of the Second Incompleteness Theorem stating: If a consistent the- 
ory F contains A, then the consistency of F cannot be proved within F. Of course, 
this also holds when F := M, the would-be f.a.s. for all mathematics. This was the 
second heavy blow to Hilbert’s Program. 

To prove the consistency of all mathematics, one is forced to use external means 
(non-finitist, metamathematical, or others yet to be discovered), which may be dis- 
putable in view of the finitist philosophy of mathematics. But fortunately, the Second 
Incompleteness Theorem does not imply that the formally developed mathematics 
M would be inconsistent. It only tells us that the chances of proving the consistency 
of such a mathematics in Hilbert’s way are null.® 


Box 4.7 (Transfinite Induction). 


This is a method of proving introduced by Cantor. Let us first recall mathematical induction. If 
ag,a,... 1s a sequence of objects (e.g., real numbers) and P is a property sensible of objects aj, 
then to prove that every element of the sequence has this property, we use mathematical induction 
as follows: 1) we must prove that P(ag) holds, and 2) we must prove that P(a,) = P(a,+1) holds 
for an arbitrary natural number n. In other words, if P holds for an element a, then it holds for its 
immediate successor an+1. 

To make the description of transfinite induction more intuitive, let us take a sequence ao, a1,..., 
where a; are real numbers and there is no index after which all the elements are equal. Suppose that 
the sequence converges and a* is the limit. (If we take, for example, a, = aH? we have a* = 1.) 
The limit a* is not a member of the sequence, because a* # a, for every natural number n. But we 
can consider a* to be an infinite (in order) element of the sequence, that is, the element that comes 


5 Gerhard Karl Erich Gentzen, 1909-1945, German mathematician and logician. 


6 Even if the mathematics is inconsistent, there are attempts to overcome this. A recent approach 
to accommodate inconsistency of a theory in a sensible manner is paraconsistent logic of Priest.’ 
The approach challenges the classical result from Box 4.1 (p. 56) that from contradictory premises 
anything can be inferred. “Mathematics is not the same as its foundations,” advocate the researchers 
of paraconsistency, “so contradictions may not necessarily affect all of ‘practical’ mathematics.” 
They have shown that in certain theories, called paraconsistent, contradictions may be allowed to 
arise, but they need not infect the whole theory. In particular, a paraconsistent axiomatic set theory 
has been developed that includes cardinals and ordinals and is capable of supporting the core 
of mathematics. Further developments of different fields of mathematics, including arithmetic, in 
paraconsistent logics are well underway. 


7 Graham Priest, 1948, English-Australian analytic philosopher and logician. 
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after every element a,, where n € N. Since @ is the smallest ordinal that is larger than any natural 
number (see p. 17), we can write a* = dw. The sequence can now be extended by aw and denoted 


0,41,---3Ay - 


Now, what if we wanted to prove that the property P holds for every element of this extended 
sequence? It is obvious that mathematical induction cannot possibly succeed, because it does not 
allow us to infer P(aq). The reason is that a@ is not an immediate successor of any a,, n € N, so 
we cannot prove P(a,) = P(dq) for any natural n. 

An ordinal that is neither 0 nor the immediate successor of another ordinal is called a limit 
ordinal. There are infinitely many limit ordinals, with @ being the smallest of them. Mathematical 
induction fails at each limit ordinal. Transfinite induction remedies that. 


Principle of Transfinite Induction: Let (S,=<) be a well-ordered set and P a property sensible for 
its elements. Then P holds for every element of S if the following condition is met: 


e Pholds for y € S if P holds for every x € S such that x < y. 


Transfinite induction is a generalization of mathematical induction. It can be used to prove that 
a given property P holds for all ordinals (or all elements of a well-ordered set; see Appendix A). 
Normally it is used as follows: 


1. Suppose that P does not hold for all ordinals. 

2. Therefore, there is the smallest ordinal, say a, for which we have =P(@). 
3. Then we try to deduce a contradiction. 

4. If we succeed, we conclude: P holds for every ordinal. 


4.3 Legacy of Hilbert’s Program 


The ideas of Whitehead and Russell put forward in their Principia Mathematica 
(Sect. 2.2.3) proved to be unrealistic. Mathematics cannot be founded on logic only. 

Also, Hilbert’s Program (Sect. 4.1.2) failed. The mechanical, syntax-directed 
development of mathematics within the framework of formal axiomatic systems 
may be safe from paradoxes, yet this safety does not come for free. The mathemat- 
ics developed in this way suffers from semantic incompleteness and the lack of a 
possibility of proving its consistency. All this makes Hilbert’s ultimate idea of the 
mechanical development of mathematics questionable. 


Aspiration and Inspiration 


Consequently, it seems that research in mathematics cannot avoid human inspi- 
ration, ingenuity, and intuition (deceptive though that can be). See Fig. 4.10. 
A fortiori, Leibniz’s idea (see p. 6) of replacing human reflection by mechanical and 
mechanized arithmetic is just an illusion. Mathematics and other axiomatic sciences 
selfishly guard their Yruths; they admit to these Yruths only humans who, in ad- 
dition to demonstrating a strong aspiration for knowledge, demonstrate sufficient 
inspiration and ingenuity. 
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Fig. 4.10 Research cannot avoid inspiration, ingenuity, and intuition 


4.4 Chapter Summary 


Hilbert proposed a promising recovery program to put mathematics on a sound foot- 
ing and eliminate all the paradoxes. To do this, the program would use formal ax- 
iomatic systems and their theories. 

More specifically, Hilbert aimed to define a formal axiomatic system M such that 
the theory M developed in it would contain the whole of mathematics. The theory M 
would also comply with several fundamental requirements: It would be consistent, 
semantically complete, and decidable. In addition, Hilbert required that a decision 
procedure for M should be devised, i.e., an algorithm should be constructed capable 
of deciding, for any formula of M, whether the formula represents a mathematical 
Truth. 

It was soon realized that M must contain at least First-Order Logic L and Formal 
Arithmetic A. This, however, enabled Gédel to discover that there can be no such M! 
In particular, in his First Incompleteness Theorem, Godel proved that if A, the For- 
mal Arithmetic, is consistent, then it is also semantically incomplete. What is more, 
any consistent formal axiomatic theory that includes A is semantically incomplete. 
In his Second Incompleteness Theorem, Godel proved that if A is consistent, this 
cannot be proved in A. Moreover, if a consistent formal axiomatic theory contains 
A, then the consistency of the theory cannot be proved in the theory. 

Although Gédel’s discovery shattered Hilbert’s Program, the problem of finding 
an algorithm that is a decision procedure for a given theory remained topical. 
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Problems 


Definition 4.1. (Gédel numbering) The arithmetization of the Formal Arithmetic A associates 
each syntactic object X of A with the corresponding Gédel number y(X) € N. The function y 
can be defined in the following way: 


e the following symbols are associated with Godel numbers as decribed: 
— logical connectives, “7,1, “V”,2; “=>”,3; (thatis, y(=) =1; y(V) =2; y(=) =3) 
— the quantification symbol, “V4; 
— the equality symbol, “=",5; 
— the individual-constant symbol, “0” ,6; 
— the unary function symbol for the successor function, “'”,7; 
— punctuation marks, “(”,8; “)”,9; “,”,10. 


e individual-variable symbols are associated with increasing prime numbers greater than 10; 
for example, x,11; y,13; z,17; and so on. 


e predicate symbols are associated with squares of increasing prime numbers greater than 10; 
for example, P,112; Q,132; R,172; and so on. 


e a formula F, viewed as a sequence F = s;s2...s,z of symbols, is associated with the number 
y(F) = pret) pr) ee pre, where p; is the ith prime number. For example, the Gddel number 
of the axiom VxVy(x = y > x! =y’) is 243!1547!31 1813!!17519!323329!!13 1737941 13437479, 


e a formal proof (derivation) D, viewed as a sequence D = F),F2,...,F, of formulas, is 


associated with the Gédel number y(D) = pr) pr), os pin) where p; is the ith prime 


number. 


Remark. Gédel’s original arithmetization was more succinct: “0”,1; “’”,3; “4,5; “V”,7; “V”,9; 
“??,11; “)”,13. The symbols such as A,=,=,5 are only abbreviations and can be represented by 
the previous ones. Individual-variable symbols were associated with prime numbers greater than 
13, and predicate symbols with squares of these prime numbers. Gédel also showed that, for k > 2, 
k-ary function and predicate symbols can be represented by the previous symbols. 


4.1. Prove: The function y: 2* — N is injective. (Remark. © denotes the set of symbols of A.) 
[Hint. Use the Fundamental Theorem of Arithmetic.] 


4.2. Let n € N. Describe how we can decide whether or not there exists an F € A such that n = y(F). 
4.3. Let n = y(F) for some F € A. Describe how we can reconstruct F = y~!(n). 
4.4. Informally describe how we can decide, for any sequence F,,F2,...,F, of formulas of A, 


whether or not the sequence is a derivation of F, in A. 
[Hint. See Box 4.2 on p. 62.] 
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Part I 
CLASSICAL COMPUTABILITY 
THEORY 


Our intuitive understanding of the concept of the algorithm, which perfectly sufficed 
for millennia, proved to be insufficient as soon as the non-existence of a certain al- 
gorithm had to be proven. This triggered the search for a model of computation, a 
formal characterization of the concept of the algorithm. In this part we will describe 
different competing models of computation. The models are equivalent, so we will 
adopt the Turing machine as the most appropriate one. We will then describe the Tur- 
ing machine in greater detail. The existence of the universal Turing machine will be 
proven and its impact on the creation and development of the general-purpose com- 
puter will be explained. Then, several basic yet crucial theorems of Computability 
Theory will be deduced. Finally, the existence of incomputable problems will be 
proven, a list of such problems from practice will be given, and several methods for 
proving the incomputability of problems will be explained. 


ye 
Chapter 5 Ritiem 
The Quest for a Formalization 


A model of a system or process is a theoretical description that 
can help you understand how the system or process works. 


Abstract The difficulties that arose at the beginning of the twentieth century shook 
the foundations of mathematics and led to several fundamental questions: “What is 
an algorithm? What is computation? What does it mean when we say that a func- 
tion or problem is computable?” Because of Hilbert’s Program, intuitive answers 
to these questions no longer sufficed. As a result, a search for appropriate defini- 
tions of these fundamental concepts followed. In the 1930s it was discovered— 
miraculously, as Gédel put it—that all these notions can be formalized, i.e., mathe- 
matically defined; indeed, they were formalized in several completely different yet 
equivalent ways. After this, they finally became amenable to mathematical analysis 
and could be rigorously treated and used. This opened the door to the seminal results 
of the 1930s that marked the beginning of Computability Theory. 


5.1 What Is an Algorithm and What Do We Mean by 
Computation? 


We have seen (Sect. 4.2.2) that if M is consistent and syntactically complete then 
there exists a decision procedure for M. But the hopes for a complete and unques- 
tionably consistent M were shattered by Gédel’s Theorems. Does this necessarily 
mean that there cannot exist a decision procedure for M? Did this put an end to re- 
search on the Entscheidungsproblem? In this case, no. Namely, the proofs of Gédel’s 
Theorems only used logical notions and methods; they did not involve loose notions 
of algorithm and computation. Specifically, the First Incompleteness Theorem re- 
vealed that there are mathematical Yruths that cannot be derived in M. But it was 
not obvious that there could be no other way of recognizing every mathematical 
Truth. (After all, Gddel himself was able to determine that his formulas G, GQ), ee 
are true although undecidable.) Because of this, the Decidability Problem for M 
and, in particular, the Entscheidungsproblem with its quest for a decision proce- 
dure (algorithm) Deniscn, kept researchers’ interest. Yet, the Entscheidungsproblem 
proved to be much harder than expected (see p. 61). What is an algorithm, anyway? 
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It became clear that the problem could not be solved unless the intuitive, loose defi- 
nition of the concept of the algorithm was replaced by a formal, rigorous definition. 


5.1.1 Intuition and Dilemmas 


So, let us return to the intuitive definition of the algorithm (Definition 1.1 on p. 4), 
which was at Hilbert’s disposal: An algorithm for solving a given problem is a recipe 
consisting of a finite number of instructions that, if strictly followed, leads to the 
solution of the problem. When Hilbert set the goal “Find an algorithm that is a 
decision procedure for M,” he, in essence, asked us to conceive an appropriate recipe 
(functioning, at least in principle, as a decision procedure for M) by using only 
common sense, logical inference, knowledge, experience, and intuition (subject only 
to finitist restrictions). In addition, the recipe was to come with an idea of how it 
would be executed (at least in principle). We see that the notions of the algorithm 
and its execution were entirely intuitive. 

But there were many questions about such an understanding of the algorithm 
and its execution. What would be the kind of basic instructions used to compose 
algorithms? In particular, would they execute in a discrete or a continuous way? 
Would their execution and results be predictable (i.e., deterministic) or probabilistic 
(i.e., dependent on random events)? These questions were relevant in view of the 
discoveries being made in physics at the time.! 

Which instructions should be basic? Should there be only finitely many of them? 
Would they suffice for composing any algorithm of interest? If there were infinitely 
many basic instructions, would that not demand a processor of unlimited capability? 
Would that be realistic? But if the processor were of limited capability, could it be 
universal, i.e., capable of executing any algorithm of interest?” 

Then there were more down-to-earth questions. How should the processor be 
constructed in order to execute the algorithms? Where would the processor keep 
the algorithms and where would the input data be? Should data be of arbitrary or 
limited size? Should storage be limited or unlimited? Where and how would basic 


' Ts nature discrete or continuous? At the beginning of the twentieth century it was discovered 
that energy exchange in nature seems to be continuous at the macroscopic level, but is discrete at 
the microscopic level. Specifically, energy exchange between matter and waves of frequency v is 
only possible in discrete portions (called quanta) of sizes nhv, n = 1,2,3..., where h is Planck’s 
constant. Notice that some energy must be consumed during the instruction execution. 

Is nature predictable or random? Nature at the macroscopic level (i.e., nature dealt with by 
classical physics) is predictable. That is, each event has a cause, and when an event seems to be 
random, it is only because we lack knowledge of its causes. Such randomness of nature is said to 
be subjective. In contrast, there is objective randomness in microscopic nature (i.e., nature dealt 
with by quantum physics). Here, an event may be entirely unpredictable, having no cause until it 
happens. Only a probability of occurrence can be associated with the event. 

So, how do all these quantum phenomena impact instruction execution? (Only recently have 
quantum algorithms appeared; they strive to use these phenomena in problem solving.) 


? Recall that such universality of a processor was Babbage’s goal nearly a century ago (p. 6). 
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instructions execute? Where would the processor keep the intermediate and final 
results? In addition, there were questions how to deal with algorithm—processor 
pairs. For example, should it be possible to encode these pairs with natural numbers 
in order to enable their rigorous, or even metatheoretical, treatment?? Should the 
execution time (e.g., number of steps) of the algorithm be easily derivable from the 
description of the algorithm, the input data, and the processor? 


5.1.2 The Need for Formalization 


The “big” problem was this: How does one answer the question “Is there an algo- 
rithm that solves a given problem?” when it is not clear what an algorithm is? 


Greg, does a dybbuk exist? 


€ 
a> 


Fig. 5.1 To decide whether something exists we must first understand what it should be 


Becky, tell me first 
what "dybbuk" means! 


YS 


To prove that there is an algorithm that solves the problem, it sufficed to con- 
struct some candidate recipe and show that the recipe meets all the conditions (i.e., 
the recipe has a finite number of instructions, which are reasonably difficult, and can 
be mechanically followed and executed by any processor, be it human or machine, 
leading it, in finite time, to the solution of the problem). The loose, intuitive un- 
derstanding of the concept of the algorithm was no obstacle for such a constructive 
existence proof. 

In contrast, proving that there is no algorithm for the problem was a much bigger 
challenge. A non-existence proof should reject every possible recipe by showing 
that it does not meet all the conditions necessary for an algorithm to solve the prob- 
lem. However, to accomplish such a proof, a characterization of the concept of 
the algorithm was needed. In other words, a property had to be found such that all 
algorithms and algorithms only have this property. Such a property would then be 
characteristic of algorithms. In addition, a precise and rigorous definition of the pro- 
cessor, i.e., the environment capable of executing algorithms, had to be found. Only 
then would the necessary condition for proving the non-existence of an algorithm 
be fulfilled. Namely, having the concept of the algorithm characterized, one would 
be in a position to systematically (i.e., with mathematical methods) eliminate all the 


3 This idea was inspired by Gédel numbers, introduced in his First Incompleteness Theorem. 
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infinitely many possible recipes by showing that none of them could possibly fulfill 
the conditions necessary for an algorithm to solve the problem. 

A definition that formally characterizes the basic notions of algorithmic compu- 
tation (1.e., the algorithm and its environment) is called a model of computation. 


NB From now until p. 99 and then on p. 104, we will use quotation marks to refer to the then- 
intuitive understanding of the notions of the algorithm, computation, and computable function. 


Thus, “algorithm”, “computation”, and “computable” function. So Hilbert asked to construct an 
“algorithm” Dgptsch that would answer the question ao for the arbitrary formula F € M. 


5.2 Models of Computation 


In this section we will describe the search for an appropriate model of computation. 
The search started in 1930. The goal was to find a model of computation that would 
characterize the notions of “algorithm” and “computation”. Different ideas arose 
from the following question: 


What could a model of computation take as an example? 


On the one hand, it was obvious that man is capable of complex “algorithmic 
computation’, yet there was scarcely any idea how he does this. On the other hand, 
while the operation of mechanical machines of the time was well understood, it was 
far from complex, human-like “algorithmic computation”. 

As aresult, three attempts were made: modeling the computation after functions, 
after humans, and after languages. Each direction proposed important models of 
computation. In this section we will describe them in detail. 


5.2.1 Modeling After Functions 


The first direction focused on the question 


What does it mean when we say that we “compute” the value of 
a function f : A— B, or when we say that the function is “computable” ? 


To get to an answer, it was useful to confine the discussion to functions that 
were as simple as possible. It seemed that such functions were the total numerical 
functions f : N‘ + N, where k > 1. If f is such a function, then for any k-tuple 
(x1,.--,X,) of natural numbers, there is a unique natural number called the value of 
f at (x1,...,x,) and denoted by f(x1,...,x,). (Later, in Sects. 5.3.3 and 5.3.4, we 
will see that the restriction to total functions had to be relaxed and the discussion 
extended to partial (i.e., total and non-total) functions.) 
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After this, the search for a definition of “computable” total numerical functions 
began. It was obvious that any such definition should fulfill two requirements: 


1. Completeness Requirement: The definition should include all the “computable” 
total numerical functions, and nothing else. 

2. Effectiveness Requirement: The definition should make evident, for each such 
function f, an effective procedure for calculating the value f(x1,...,x,). 


An effective procedure was to be a finite set S of instructions such that 
a. each instruction in S is exact and expressed by a finite number of symbols; 
b. S can, in practice and in principle, be carried out by a human; 
c. S can be carried out mechanically, without insight, intuition, or ingenuity; 
d. if carried out without error, S yields the desired result in finitely many steps. 


Thus, “computable” functions were usually called effectively calculable. 


The first requirement asked for a characterization of the intuitively computable total 
numerical functions. Only if the second requirement was fulfilled would the defined 
functions be considered algorithmically computable. Notice that although the notion 
of the effective procedure is a refinement of the intuitive notion of the algorithm (see 
Definition 1.1 on p. 4), it is still an intuitive, informal notion. Of course, an algorithm 
for f would be an effective procedure disclosed by f’s definition. 

Definitions were proposed by Gédel and Kleene ({1-recursive functions), Her- 
brand and Gédel (general recursive functions), and Church (A -definable functions). 


u-Recursive Functions 


In the proof of his Second Theorem, Gédel introduced numerical functions, the con- 
struction of which resembled the derivations of theorems in formal axiomatic sys- 
tems and their theories. More precisely, Gédel fixed three simple initial functions, 
€¢:N>N,o:N—-N, and mk : N* 5 N (called the zero, successor, and projec- 
tion function, respectively), and two rules of construction (called composition and 
primitive recursion) for constructing new functions from the initial and previously 
constructed ones.* (There are more details in Box 5.1.) 

The functions constructed from ¢, o, and 7 by finitely many applications of 
composition and primitive recursion are total and said to be primitive recursive. 
Although Gédel’s intention was to use them in proving his Second Incompleteness 
Theorem, they displayed a property much desired at that time. Namely, the construc- 
tion of a primitive recursive function is also an effective procedure for computing 
its values. So, a construction of such a function seemed to be the formal counterpart 
of the “algorithm”, and Gédel’s definition of primitive recursive functions seemed 
to be the wished-for definition of the “computable” total numerical functions. 

However, Ackermann and others found the total numerical functions called the 
Ackermann functions (see p. 108), which were intuitively computable but not prim- 
itive recursive. So Gédel’s definition did not meet the Completeness Requirement. 


4 The resemblance between function construction and theorem derivation is obvious: Initial func- 
tions correspond to axioms, and rules of construction correspond to rules of inference. 
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Fig. 5.2 Kurt Godel Fig. 5.3 Stephen Kleene 
(Courtesy: See Preface) (Courtesy: See Preface) 


This deficiency was eliminated in 1936 by Kleene. He added to Gédel’s defi- 
nition a third rule of construction, called the U-operation, which seeks indefinitely 
through the series of natural numbers for one satisfying a primitive recursive re- 
lation. (See Box 5.1.) The functions that can be constructed from €, o, and 2 by 
finitely many applications of composition, primitive recursion, and the -operation 
are said to be LL-recursive. 


NB Kleene assumed that the -operation would be applied to construct only total functions, al- 
though it could also return non-total functions (i.e., those that are undefined for some arguments). 


The class of [-recursive total functions proved to contain any conceivable intu- 
itively computable total numerical function. So, Gddel-Kleene’s definition became 
a plausible formalization of a “computable” numerical function. Consequently, con- 
struction of a L-recursive function became a plausible formalization of the notion of 
the “algorithm”. All this was gathered in the following model of computation. 


Model of Computation (Gédel-Kleene’s Characterization): 


e An “algorithm” is a construction of a [-recursive function. 

e A “computation” is a calculation of a value of a l-recursive function that 
proceeds according to the construction of the function. 

e A “computable” function is a U-recursive total function. 


Box 5.1 (u-Recursive Functions). 


Informally, a function is said to be u-recursive if either it is an initial function or it has been 
constructed from initial or previously constructed functions by a finite application of the three 
tules of construction. Remark: For brevity, we will write in this box 7 instead of N1,---,Mk. 


The initial functions are: 


a. ¢(n) =0, for every natural n (Zero function) 
b.  o(n) =n+1, for every natural n (Successor function) 
CG: mkt) = nj, for arbitrary W and 1 <i<k (Projection (or identity) function) 


7 Stephen Cole Kleene, 1909-1994, American mathematician. 
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The rules of construction are: 


1. Composition. Let the given functions be g : N” — N and h; : N‘ N, fori=1,...,m. 
Then the function f : N‘ + N defined by 


f(v#) = g(a (W),... Mm (7) 
is said to be constructed by composition of the functions g and hj, i= 1,...,m. 


2. Primitive Recursion. Let the given functions be g : N‘ > N and h: N'#? GN, 
Then the function f : N‘+! — N defined by 
f(W,0) = g(t) 


def 


f(a@,m+1) = h(7,m, f(7,m)), form > 0 
is said to be constructed by primitive recursion from the functions g and h. 


3. u-Operation (Unbounded Search). Let the given function be g : Ne _N, 
Then the function f : N‘ — N defined by 


FC) S pxg (7x) 


is said to be constructed by the u-operation from the function g. Here, the L1-operation is defined 
as follows: uxg(W,x) * Jeast x N such that g(W,x) =1A g(a ,z) is defined for z=0,...,x. 
NB uxg( a,x) may be undefined, e.g., when g is such that g(W,x) # | for every x EN. 


The construction of a [-recursive function f is a finite sequence f), fo,...,f, where fy = f and 
each f; is either one of the initial functions €, 0,7, or is constructed by one of the rules 1,2,3 from 
its predecessors in the sequence. Taken formally, the construction is a finite sequence of symbols. 
But there is also a practical side of construction: After fixing the values of the input data, we can 
mechanically and effectively calculate the value of f simply by following its construction and 
calculating the values of the intermediate functions. A construction is an algorithm. 


Example 5.1. (Addition) Let us construct the function sum(n,n2) - n, +n. We apply the fol- 
lowing idea: To calculate sum(1,n2) we first calculate sum(n,n2 — 1) and then its successor. The 
computation involves primitive recursion, which terminates when sum(n,,0) should be calculated. 
In the latter case, the sum is just the first summand, n,. In the list below, the left-hand column 
contains initial and constructed functions needed to implement the idea, and the right-hand column 
explains, for each function, why it is there or how it was constructed. The function sum is fs. 


1. a(x) to extract its argument and make it available for use 

2. 7 (x,y,Z) to introduce the third variable, which will eventually be the result x + y 
3. o(x) to increment its argument 

4. fa(x,y,Z) to increment the third argument; constructed by composition of 3. and 2. 
5. fs(x,y) to compute x + y; constructed by primitive recursion from 1. and 4. 


Now, the construction of sum is a sequence of functions and information about the rules applied: 
fi =a (m1); fo = 3 (n1,n2,n3); fs = o(n1); fal, no,n3)[rule 1, fs, fo); fs(m1,n2) [rule 2, fi, fa). 
The function fs is constructed by primitive recursion from functions f, and f4, so we have: 
fs(m1 na) = fa(mi,na—I, fs(m,m.—1)) = fo(mima—1)+1 =... = fs(1,0) +2 = my (m1) +2 
ny +n. Given n; and no, say n} =2 and no = 3, we can calculate f(2,3) by following the con- 
struction and calculating the values of functions: f) = nt} (m) =2; fo = 73 (ny ,n2,N3) =n; 
f3 = O(n1) = 3; fa(mi,n2,n3) = 0(73(m,n2,n3)) = O(n3) = 3 +1; fs(m1,n2) = f5(2,3) = 
fa(2,2, f5(2,2)) = f5(2,2) +1 = fa(2,1, f5(2,1)) + I fs(2,1) + 2 fa(2,0, f5(2,0)) +2 = 
f5(2,0) +3 =24+3=5. 
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(General) Recursive Functions 


In 1931, the then 23-year-old Herbrand® investigated how to define the total nu- 
merical functions f: N* +N using systems of equations. Before suffering a fatal 
accident while mountain climbing in the French Alps, he explained his ideas in a 
letter to Godel. We cite Herbrand (with the function names changed): 
If f denotes an unknown function and g1,...,g, are known functions, and if the g’s and f 
are substituted in one another in the most general fashions and certain pairs of the resulting 


expressions are equated, then, if the resulting set of functional equations has one and only 
one solution for f, f is a recursive function. 


pes 


Fig. 5.4 Jacques Herbrand 
(Courtesy: See Preface) z 


Gédel noticed that Herbrand did not make clear what the rules for computing the 
values of such an f would be. He also noted that such rules would be the same for 
all the functions f defined in Herbrand’s way. Thus, Gédel improved on Herbrand’s 
idea in two steps. First, he added two conditions to Herbrand’s idea: 


e A system of equations must be in standard form, where f is only allowed to be 
on the left-hand side of the equations, and it must appear as 


F(gil...),-008;(..)) =e 


(Note: This is not required for g1,...,g%, so these may be defined by recursion.) 
e Asystem of equations must guarantee that f is well defined (i.e., single-valued). 


Let us denote by &(f) a system of equations that fulfills the two conditions and 
defines a function f : N‘ + N. Second, Gédel started to search for the rules by which 
&(f) is used to compute the values of f. In 1934, he realized that there are only two 
such rules: 


e Inan equation, all occurrences of a variable can be substituted by the same num- 
ber (1.e., the value of the variable). 
e In an equation, an occurrence of a function can be replaced by its value. 


Thus Gédel (i) specified the form of equations and required that (ii) exactly one 
equation f(...) =... is deducible by substitution and replacement in &(f), and (iii) 
the equations giving the values of g1,...,g, are defined in a similar way. 

A function f : N* > N for which there exists a system &(f) Gédel called general 
recursive. Today we call it recursive. 


6 Jacques Herbrand, 1908-1931, French logician and mathematician. 
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It seemed that any conceivable intuitively computable total numerical function 
could be defined and effectively computed in a mechanical fashion by some sys- 
tem of equations of this kind. The Completeness Requirement and Effectiveness Re- 
quirement of such a definition of “computable” functions seemed to be satisfied. So, 
Herbrand and Gédel’s ideas were merged in the following model of computation. 


Model of Computation (Herbrand-Gédel’s Characterization): 


e An “algorithm” is a system of equations &(f) for some function f. 

e A “computation” is a calculation of a value of a (general) recursive 
function f that proceeds according to &(f) and the rules 1, 2. 

e A “computable” function is a (general) recursive total function. 


A-definable functions 


We start with two examples that give us the necessary motivation. 


Example 5.2. (Motivation) What is the value of the term (5—3) + (6+4)? To get an answer, we 
first rewrite the term in the prefix form, «(—(5,3),+(6,4)), and getting rid of parentheses we 
obtain * — 5 3+ 6 4. This sequence of symbols implicitly represents the result (the number 20) 
by describing a recipe for calculating it. Let us call this sequence the initial term. Now we can 
compute the result by a series of reductions (elementary transformations) of the initial term: 
*—53+64 *2+64 * 210 —> 20. For example, in the first reduction we applied — 
to 5 and 3 and then replaced the calculated subterm — 5 3 by its value 2. Note that there is a different 
series of reductions that ends with the same result: * -5 3+64 *—5310 «210 20. 


Example 5.3. (Motivation) Usually, a function f : N— N is defined by the equation f(x) = 
[...x...], where the right-hand side is an expression containing x. Alternatively, we might define f 
by the expression f = Ax.[...x...], where Ax would by convention indicate that x is a variable in 
[...x...]. Then, instead of writing f(x),x =a, we could write Ax. [...x...]a, knowing that a should 
be substituted for each occurrence of x in [...x...]. For example, instead of writing f(x) =x” and 
g(y) =x’, which are two different functions, we would write Ax.x” and Ay. x’, respectively. Instead 
of writing f(x),x=3 and g(y),y=5 we would write (Ax.x”)3 and (Ay.x”)5, which would result in 
3” and x°, respectively. Looking now at x” as a function of two variables, we would indicate this 
by Axy.x”. Its value at x = 3, y = 5 would then be ((Axy.x”)3)5 = (Ay.3”)5 = 3°. 


During 1931-1933, Church’ proposed, based on similar ideas, a model of com- 
putation called the A-calculus. We briefly describe it. (See Box 5.2 for the details.) 

Let f be a function and a),...,da, its arguments. Each a; can be a number or an- 
other function with its own arguments. Thus, functions can nest within other func- 
tions. Church proposed a way of describing f and aj,...,a, as a finite sequence of 
symbols that implicitly represents the value f(a,,...,dn) by describing a recipe for 
calculating it. He called this sequence the initial A-term. The result f(a1,...,ay) is 


7 Alonzo Church, 1903-1995, American mathematician and logician. 
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computed by a systematic transformation of the initial A-term into a final A-term 
that explicitly represents the value f(a1,...,d,). The transformation is a series of 
elementary transformations, called reductions. Informally, a reduction of a A-term 
applies one of its functions, say g, to g’s arguments, say b1,...,bm, and replaces the 
A-terms representing g and b;,...,b, with the A-term representing g(b1,...,Dm). 


Fig. 5.5 Alonzo Church 
(Courtesy: See Preface) 


Generally, there are several different transformations of a A-term. However, 
Church and Rosser® proved that the final A-term is practically independent of the 
order in which the reductions are made; specifically, the final A-term, when it exists, 
is defined up to the renaming of its variables. 

Church called functions that can be defined as A-terms, A-definable. It seemed 
that any conceivable intuitively computable total numerical function was A -definable 
and could effectively be calculated in a mechanical manner. So this definition of 
“computable” numerical functions seemed to fulfill the Completeness and Effective- 
ness Requirements. Thus, Church proposed the following model of computation. 


Model of Computation (Church’s Characterization): 


e An “algorithm” is a A-term. 
e A “computation” is a transformation of an initial A-term into a final one. 
e A “computable” function is a A-definable total function. 


Box 5.2 (A-calculus). 
Let f, g,x,y,z,-.. be variables. Well-formed expressions will be called A-terms. 


A A-term is a well-formed expression defined inductively as follows: 


a. avariable is a A-term (called an atom); 
b. if Mis aA-term and x a variable, then (Ax. M) is a A-term (built from M by abstraction); 
c. if Mand WN are A-terms, then (MN) is a A-term (called the application of M to N). 


8 John Barkley Rosser, 1907-1989, American mathematician and logician. 
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Remarks. Informally, abstraction exposes a variable, say x, that occurs (or even doesn’t occur) in 
a A-term M and “elevates” M to a function of x. This function is denoted by Ax. M and we say that 
x is now bound in M by Ax. A variable that is not bound in M is said to be free in M. A A-term 
can be interpreted as a function or an argument of another function. Thus, in general, the A-term 
(MN) indicates that the A-term M (which is interpreted as a function) can be applied to the A-term 
N (which is interpreted as an argument of M). We also say that M can act on N. By convention, the 
application is a left-associative operation, so (MNP) means ((MN)P). However, we can override 
the convention by using parentheses. The outer parentheses are often omitted, so MN means (MN). 
Any A-term of the form (Ax.M)N is called a B-redex (for reasons to become known shortly). A 
A-term may contain zero or more B-redexes. 


A-terms can be transformed into other A-terms. A transformation is a series of one-step transfor- 
mations called B-reductions and denoted by —g. There are two rules to do a B-reduction: 


1. a&-conversion (denoted +> g,) renames a bound variable in a A-term; 
2. B-contraction (denoted +g) transforms a B-redex (Ax. M)N into a A-term obtained from M by 
substituting N for every bound occurrence of x in M. Stated formally: (Ax.M)N = M[x := NJ. 


Remarks. Intuitively, a B-contraction is an actual application (i.e., acting) of a function to its argu- 
ments. However, before a B-contraction is started, we must apply all the necessary a@-conversions 
to M to avoid unintended effects of the B-contraction, such as unintended binding of N’s free vari- 
ables in M. When a A-term contains no B-redexes, it cannot further be B-reduced. In this case the 
term is said to be a B-normal form (B-nf). Intuitively, such a A-term contains no function to apply. 


A computation is a transformation of an initial A-term fo with a sequence of B-reductions, that is 
to > pti >pl2>B°- 


If t; is a member of this sequence, we say that fo is B-reducible to ; and denote this by t > pti. 
The computation terminates if and when some B-nf f, is reached. This A-term is said to be final. 

A non-final A-term may have several B-redexes. Each of them is a candidate for the next B- 
contraction. Since the selection of this usually affects the subsequent computation, we see that, in 
general, there exist different computations starting in fg. Hence the questions: “Which of the pos- 
sible computations is the ‘right’ one? Which of the possibly different final A-terms is the ‘right’ 
result?” Fortunately, Church and Rosser proved that the order of B-reductions does not matter that 
much. Specifically, the final A-term—when it exists—is defined up to @-conversion (i.e., up to the 
renaming of its bound variables). In other words: If different computations terminate, they return 
practically equal results. This is the essence of the following theorem. 


Theorem. (Church-Rosser) If fo 4 pU and fy + pV, then there is W such that U + pW andV + pw. 


Since we may chose a terminating computation, we can standardize computations by fixing a rule 
for selecting B-redexes. For example, we may always immediately reduce the leftmost B-redex. So 
the initial term and the fixed rule make up a determinate algorithm for carrying out the computation. 
Natural numbers are represented by (rather unintuitive) B-nfs c;, called Church’s numerals: 


co HAfx. fox = Af.(Ax.x) 
cr =Afx.fix = Af.(Ax. fx) 
c HAfx.f?x = Af.(Ax. f(fx)) 


cy = Af fx = Af (A foo fPX)--)) 
A function f : N‘ > N is A-definable if there is a A-term F such that 


m then Fen, ...Cny 5 Cus 
undefined then Fey, ...cn, has no B-nf. 


if f(m,...,n) = 
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Example 5.4. (Addition) The function sum(n;,n2) = nj +n can be defined in A-calculus by a 
A-term S = dab fx.af(bfx). To prove this, we must check that Sep, Cn, +p Cn +n) for any ny ,N2. 
So we compute: Sen, Cny = (Sen, )Cny = (Aabfx.af(bfx)) en )eny +p (ABFX.Cn, f(DFX)) en. > 

—p Afx. cn f (Cag fx) = ASX. (ASX. fx) f ((Afx. fx) fx) +g Af. (Ax. fl") ((Afx. fea) fx) > p 


Bp Afx(Ax.f"' x) ((Axf"2x)x) B Afx(Axf™ x) (f"2x) Bp Afx.f fexsAfxfrtr@x= Cn +m 


5.2.2 Modeling After Humans 


The second direction in the search for an appropriate model of computation was 
an attempt to model computation after humans. The idea was to abstract man’s ac- 
tivity when he mechanically solves computational problems. A seminal proposal 
of this kind was made by Turing,? whose model of computation was inspired both 
by human “computation” and a real mechanical device (for details see Sect. 16.3.6). 
In what follows we give an informal description of the model and postpone rigorous 
treatment of it to later sections (see Sect. 6.1). 


Turing Machine 


Turing took the quest for a definition of mechanical computation at face value. It 
has been suggested that he was inspired by his mother’s typewriter. In 1936, the 
then 24-year-old conceived his own model of computation, which he called the 
a-machine (automatic machine) and also the logical computing machine. Today, 
the model is called the Turing machine (or TM for short). 


Fig. 5.6 Alan Turing 
(Courtesy: See Preface) 


The Turing machine consists of several components (see Fig. 5.7): 


1. acontrol unit (corresponding to the human brain); 

2. a potentially infinite tape divided into equally sized cells (corresponding to the 
paper used during human “‘computation’’); 

3. a window that can move over any cell and makes it accessible to the control unit 
(corresponding to the human eye and the hand with a pen). 


9 Alan Mathison Turing, 1912-1954, British mathematician, logician, and computer scientist. 
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Turing 
program 


control unit 


Fig. 5.7 Turing machine 
(a-machine) window 


The control unit is always in some state from a finite set of states. Two of the states 
are the initial and the final state. There is also a program in the control unit (corre- 
sponding to the “algorithm” that a human uses when “computing” the solution of 
a given problem). We call it the Turing program. Different Turing machines have 
different Turing programs. 

Before the machine is started, the following must be done: 


1. an input word (i.e., the input data written in an alphabet 2) is written on the tape; 
2. the window is shifted to the beginning of the input word; 
3. the control unit is set to the initial state. 


From then on, the machine operates independently, in a purely mechanical fash- 
ion, step by step, as directed by its Turing program. At each step, the machine reads 
the symbol from the cell under the window into the control unit and, based on this 
symbol and the current state of the control unit, does the following: 


1. writes a symbol to the cell under the window (while deleting the old symbol); 
2. moves the window to one of the neighboring cells or leaves the window as it is; 
3. changes the state of the control unit. 


The machine halts if the control unit enters the final state or if its Turing program 
has no instruction for the next step. 

How can Turing machines calculate the values of numerical functions? The gen- 
eral idea is to associate a Turing machine T with extensional mapping of T’s inputs 
to T’s outputs. More specifically, take any Turing machine T and any k € N and then 


define a function i as follows: 


If the input word to T represents natural numbers n,,...,ng © N, and T halts, and 
after halting the tape contents make up a word over & that represents a natural 


number, say m, then let m be the value of the function f at n,...,n; that is, 


AO (m1, ... 5k) =m. 


We can view FO as a function that maps k words over X into a word over 2. The 
values of Ff can be mechanically computed simply by executing the Turing pro- 
gram of T. Today, we say that the function f© is Turing-computable. Actually, 


any function f for which there exists a Turing machine T such that f = A, for 
some k, is said to be Turing-computable. 


90 5 The Quest for a Formalization 


Turing believed that any conceivable intuitively computable total numerical func- 
tion can be effectively calculated with some Turing machine. To show this, he de- 
veloped several techniques for constructing Turing machines and found machines 
for many functions used in mathematics. Thus, it seemed that “computable” func- 
tions could be identified with Turing-computable total functions. In sum, it seemed 
that this definition of a “computable” function would fulfill the Completeness and 
Effectiveness Requirements. Hence, the following model of computation. 


Model of Computation (Turing’s Characterization): 


e An “algorithm” is a Turing program. 
e A “computation” is an execution of a Turing program on a Turing machine. 
e A “computable” function is a Turing-computable total function. 


Box 5.3 (Memorizing During Computation). 


It seems that there is an important difference between a human and a Turing machine. A human 
can see only a finite portion of the scribbles under his or her nose, yet is able to remember some 
of the previously read ones and use them in the “computation”. It seems that the Turing machine’s 
control unit—lacking explicit storage—does not allow for this. 

However, this is not true. We will see later that the control unit can simulate finite storage. The 
basic idea is that the control unit memorizes the symbol that has been read from the tape by chang- 
ing to a state that corresponds to the symbol. This enables the Turing machine to take into account 
the memorized symbol during its operation. In order to implement this idea, the states and instruc- 
tions of the Turing machine have to be appropriately defined and interpreted during the execution. 
(Further details will be given in Sects. 6.1.2 and 6.1.3.) 


Example 5.5. (Addition) That sum(n, 72) is Turing-computable is shown in Example 6.1(p.115). 


5.2.3 Modeling After Languages 


The third direction in the search for an appropriate model of computation focused 
on modeling after languages. The idea was to view human mathematical activity as 
the transformation of a sequence of words (description of a problem) into another 
sequence of words (description of the solution of the problem), where the transfor- 
mation proceeds according to certain rules. Thus, the “computation” was viewed as 
a sequence of steps that gradually transform a given input expression into an output 
expression. The rules that govern the transformation are called productions. Each 
production describes how a current expression should be partitioned into fixed and 
variable sub-expressions and, if the partition is possible, how the sub-expressions 
must be changed and reordered to get the next expression. So, productions are of 
the form 
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AX] 1X2... Ay—1XnAn > Boxi, Bixi, tee Bm—1Xin Bm 


where @j, B; are fixed sub-expressions and x, are variables whose values are arbi- 
trary sub-expressions of the current expression. Then, any finite set of productions 
is called a grammar. Based on these ideas, Post and Markov proposed two models 
of computation, called the Post machine and the Markov algorithms. 


The Post Machine 


In the 1920s, Post investigated the decidability of logic theories. He viewed the 
derivations in theories as man’s language activity and, for this reason, developed his 
canonical systems. (We will return to these in Sect. 6.3.2.) The research on canonical 
systems forced Post to invent an abstract machine, now called the Post machine. 


« 


yj) 
fi . 
Fig. 5.8 Emil Post | IE A 
/ ee 


(Courtesy: See Preface) 


The Post machine over the alphabet © is founded on a pair (G,Q), where G is 
a directed graph and Q is a queue. The intention is that Q will receive input data 
(encoded in XY) and G will operate on Q, eventually leaving the encoded result in it. 

We set © = {0, 1} since symbols of larger alphabets can be encoded in ZY; and we 
introduce a symbol # to mark, when necessary, distinguished places in Q’s contents. 

To implement the intention, we must define the basic instructions and how they 
will be executed in G. Let V = {v1,...,vn} be the set of G’s vertices, for some 
n EN, and let A C V xV be the set of G’s arcs. Each vertex v; € V contains an 
instruction, generically denoted by INST(v;), and each arc (vj,vj) € A can pass a 
signal from v; to v;. By definition, v; has no incoming arcs; this is the initial vertex. 
Some, potentially none, of the other vertices have no outgoing arcs; these are the 
final vertices. Each of the remaining vertices has at least one incoming arc and, 
depending on its instruction, either one or four outgoing arcs. 

The instructions are of four kinds. The first kind, the START instruction, is con- 
tained in (and only in) the initial vertex v;. This instruction executes as follows: (1) it 
copies a given input word from G’s environment into the queue Q and (2) it triggers 
the instruction INST(v;) by sending a signal via the arc (vj,v;) € A. Of the second 
kind are the instructions ACCEPT and REJECT. These are contained in (and only in) 
the final vertices. When an accept (or reject) instruction is executed, the input word 
is recognized as accepted (or rejected); thereafter the computation halts. The third 
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kind of instruction is ENQ(z), where z € 2 U {#}. When executed in a vertex v;, the 
instruction (1) attaches the symbol z to the end of Q, i.e., changes Q = (x) ...Xm) 
to O = (x, ...%mz), and (2) triggers INST(v;) via (vj,vj) € A. The fourth kind of 
instruction is DEQ(p,q,r,5), where 2 < p,q,r,s <n. When executed in vj, it (1) 
checks Q for emptiness and in this case triggers INST(v,) via (vj,vp) € A; other- 
wise, it removes the head of Q, i.e., changes Q = (x1x2...X%m) to Q = (x2...X%m), 
and (2) triggers INST(vq), INST(v,), or INST(vs) via the corresponding arc, if the 
removed head was 0, 1, or #, respectively. 

The graph G augmented with the above instructions and rules of execution is 


called the Post program.'° (See Fig. 5.9.) 
= 
V2 


V3 Vy 
DEQ(7,6,7,4) }+(accePT) 


Vs V6 at 
(ENQ(0) Je {DEQU7.5.9.7) 


Vg [yy \ 
(ENQQ) }<-(DEQU7,7,8,10) 


Fig. 5.9 A Post program 


To develop the above ideas into a model of computation, the Post program must 
be supplemented by a supporting environment. The result is a structural view of the 
Post machine (see Fig. 5.10) that bears some resemblance to the Turing machine. 


Post | 


program 


queue 


Fig. 5.10 Post machine 


But there are differences too. The Post machine has a control unit that can store 
a Post program and execute its instructions; a potentially infinite read-only tape 
divided into equally sized cells; a window that can move to the right over any cell 
thus making it accessible to the control unit for reading; a queue that can store the 
input word (with input data), or the current word (with the intermediate results), or 
the final word (with the final result). 


10 Th the late twentieth century, a similar model of computation, the data-flow graph, was used. 
Here, vertices contain usual instructions such as arithmetical, logic, I/O, and jumps; arcs pass 
intermediate results that act as triggering signals; the instruction in a vertex is triggered as soon 
as the last awaited operand appears on an incoming arc; and several instructions can be triggered 
simultaneously. 
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Before the machine is started, the following must be done: An input word (.e., 
input data encoded in the alphabet 2) is written on the tape; the window is positioned 
over the first symbol of the input word; and the queue is emptied. 

The computation starts by triggering the instruction START of the Post program. 
From then on, the machine operates independently, in a purely mechanical fashion, 
step by step, as directed by its Post program. At each step, the previously triggered 
instruction executes and triggers another instruction. The machine halts if and when 
one of the instructions ACCEPT or REJECT has executed. 

During the computation the contents of the queue gradually transform from the 
input word to a final word, the result of the computation. Note that the computation 
may never halt (when none of the final vertices is ever reached). 


Can Post machines compute the values of functions, say numerical functions? 
Take any Post machine P and any k €N, and define a function Ff as follows: 


If the input word to P represents natural numbers n,,...,nx, and P halts, and after 
halting the queue contains a word over & that represents a natural number, say m, 


then let m be the value of the function f& at ny,...,n; that is 


def 


8 (a.m) Bm. 


We can view f@ as a function that maps k words over X into a word over Y. The 
values of Ff can be mechanically computed by P. In general, we say that a function 


f is Post-computable if there is a Post machine P such that f = re for some k. 
This brings us to the following model of computation. 


Model of Computation (Post’s Characterization): 


e An “algorithm” is a Post program. 
e A “computation” is an execution of a Post program on a Post machine. 
e A “computable” function is a Post-computable total function. 


Markov Algorithms 


Later, in the Soviet Union, similar reasoning was applied by Markov.!! In 1951, he 
described a model of computation that is now called the Markov algorithm. 
A Markov algorithm is a finite sequence M of productions 


'l Andrey Andreyevié Markov, Jr., 1903-1979, Russian mathematician. 
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a — Bi 
O — Bo 


On — Br 


where Q;, 8; are words over a given alphabet Y. The sequence M is also called the 
grammar. A production 0; — B; is said to be applicable to a word w if a; is a sub- 
word of w. If such a production is actually applied to w, it transforms w so that it 
replaces the leftmost occurrence of 0; in w with B;. 

An execution of a Markov algorithm is a sequence of steps that gradually trans- 
form a given input word via a sequence of intermediate words into some output 
word. At each step, the last intermediate word is transformed by the first applicable 
production of M. Some productions are said to be final. If the last applied produc- 
tion was final, or if there was no production to apply, then the execution halts and 
the last intermediate word is the output word. 


Fig. 5.11 Andrey Markov 
(Courtesy: See Preface) 


Markov algorithms can be used to calculate the values of numerical functions. 
Let M be any Markov algorithm and k € N. Then define a function fg ) as follows: 


If the input word to M represents natural numbers n,...,ng, and M halts, and after 
halting the output word represents a natural number, say m, then let m be the value 


of the function FP at n,...,N; that is 


AP (sme) = m. 


We can view Ff? as a function from (£*)* to £* whose values can be mechanically 
computed by M. In general, we say that a function f is Markov-computable if 
there is a Markov algorithm M such that f = fg ) for some k. 

In sum, we can define the following model of computation. 


Model of Computation (Markov’s Characterization): 


e An “algorithm” is a Markov algorithm (grammar). 
e A “computation” is an execution of a Markov algorithm. 
e A “computable” function is a Markov-computable total function. 
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5.2.4 Reasonable Models of Computation 


Although the described models of computation are completely different, they share 
an important property: They are reasonable. Clearly, we cannot prove this because 
reasonableness is not a formal notion. But we can give some facts in support. 

First we introduce new notions. An instance of a model of computation is called 
the abstract computing machine. Examples of such a machine M are: 


e a particular construction of a U-recursive function (with the rules for using the 
construction to calculate the function values); 

a particular system of equations &(f) (with substitution and replacement rules); 
a particular A-term (with o@-conversion and B-reduction rules); 

a particular Turing program (in a Turing machine); 

a particular Post program (in a Post machine); and 

a particular Markov grammar (with the rules of production application). 


As the computation on an abstract computing machine M goes on, the status (e.g., 
contents, location, value, state) of each component of M may change. At any step 
of the computation, the statuses of the relevant components of M@ make up the so- 
called internal configuration of M at that step. Informally, this is a snapshot of M at 
a particular point of its computation. 


Now we return to the reasonableness of the models. For any abstract computing 
machine M belonging to any of the proposed models of computation, it holds that 


1. M is of limited capability. This is because: 


a. it has finitely many different basic instructions; 

b. each basic instruction takes at least one step to be executed; 

c. each basic instruction has a finite effect, i.e., the instruction causes a limited 
change in the current internal configuration of M. 


2. There is a finite-size description of M, called the code of M and denoted by (M). 

3. The code (M) is effective in the sense that, given an internal configuration of M, 
the code (M) enables us to “algorithmically construct” the internal configuration 
of M after the execution of an instruction of M. 

4. M is unrestricted, 1.e., it has potentially infinite resources (e.g., time and space). 


The proposed models of computation are reasonable from the standpoint of Com- 
putability Theory, which is concerned with the questions “What is an algorithm? 
What is computation? What can be computed?” Firstly, this is because these mo- 
dels do not offer unreasonable computational power (see item | above). Secondly, 
the answers to the questions are not influenced by any limitation of the computing 
resources except that a finite computation must use only a finite amount of each of 
the resources (see item 4). Hence, any problem that cannot be computed even with 
unlimited resources will remain such if the available resources become limited.” 


2 In Computational Complexity Theory, a large special part of Computability Theory, the require- 
ments (1c) and (4) are stiffer: In (1c) the effect of each instruction must be reasonably large (not 
just finite); and in (4) the resources are limited, so their consumption is highly important. 
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5.3 Computability (Church-Turing) Thesis 


The diversity of the proposed models of computation posed the obvious question: 
“Which model (if any) is the right one?” Since each of the models fulfilled the 
Effectiveness Requirement, the issue of effectiveness was replaced by the question 
“Which model is the most natural and appealing?” Of course, the opinions were 
subjective and sometimes different. As with the Completeness Requirement, it was 
not obvious whether the requirement was truly fulfilled by any of the models. In this 
section we will describe how the Computability Thesis, which in effect states that 
each of the proposed models fulfills this requirement, was born. (A systematic and 
detailed account of the origins and evolution of the thesis is given in Chap. 16.) 


5.3.1 History of the Thesis 


Church. By 1934, Kleene managed to prove that every conceivable intuitively com- 
putable total numerical function was A-definable. For this reason, in the spring 
of 1934, Church conjectured that “computable” numerical functions are exactly 
A-definable total functions. In other words, Church suspected that the intuitive con- 
cepts of algorithm and computation are appropriately formalized by his model of 
computation. He stated this in the following thesis: 


Church Thesis. “algorithm” <— /-term 


Church presented the thesis to Gédel, the authority in mathematical logic of the 
time, but he rejected it as unsatisfactory. Why? At that time, Gédel was reflecting on 
the relation between intuitively computable functions and (general) recursive func- 
tions. He suspected that the latter might be a formalization of the former, yet he was 
well aware of the fact that such an equivalence could not possibly be proved, be- 
cause it would equate two concepts of which one is formal (i.e., precisely defined) 
and the other is informal (i.e., intuitive). In his opinion, researchers needed to con- 
tinue analyzing the intuitive concepts of algorithm and computation. Only after their 
intrinsic components and properties were better understood would it make sense to 
propose a thesis of this kind. 

Shortly after, Kleene, Church, and Rosser proved the equivalence between the 
A-definable functions and the (general) recursive functions in the sense that every A- 
definable total function is (general) recursive total, and vice versa. Since A-calculus 
(being somewhat unnatural) was not well accepted in the research community, 
Church restated his thesis in the terminology of the equivalent (general) recursive 
functions and, in 1936, finally published it. 

This, however, did not convince Gédel (and Post). In their opinion, the equiva- 
lence of the two models still did not indicate that they fulfilled the Completeness 
Requirement, i.e., that they fully captured the intuitive concepts of algorithm and 
computation. 
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Turing. Independently, Turing pursued his research in England. He found that 
every conceivable intuitively computable total numerical function was Turing- 
computable. He also proved that the class of Turing-computable functions remains 
the same under many generalizations of the Turing machine. So, he suspected that 
“computable” functions are exactly Turing-computable total functions. That is, he 
suspected that the intuitive concepts of algorithm and computation are appropriately 
formalized by his model of computation. In 1936, he published his thesis. 


Turing Thesis. “algorithm” —> Turing program 


Godel accepted the Turing machine as the model of computation that convinc- 
ingly formalizes the concepts of “algorithm” and “computation”. He was con- 
vinced!> by the simplicity and generality of the Turing machine, its mechanical 
working, its resemblance to human activity when solving computational problems, 
Turing’s reasoning and analysis of intuitively computable functions, and his argu- 
mentation that such functions are exactly Turing-computable total functions. 

In 1937, Turing also proved that Church’s and his model are equivalent in the 
sense that what can be computed by one can also be computed by the other. That is, 


A-definable <=> Turing-computable 


He proved this by showing that each of the two models can simulate the basic in- 
structions of the other. 

Because of all of this, the remaining key researchers accepted the Turing ma- 
chine as the most appropriate model of computation. Soon, the Turing machine met 
general approbation. 


5.3.2 The Thesis 


As the two theses were equivalent, they were merged into the Church-Turing Thesis. 
More recently, this has also been given the neutral name Computability Thesis (CT). 


Computability Thesis. “algorithm” — Turing program (or equivalent model) 


Gradually, it was proved that other models of computation are equivalent to the 
Turing machine or some other equivalent model. This confluence of ideas, the equiv- 
alence of diverse models, strengthened belief in the Computability Thesis.'+ 


'3 Tf the reader has doubts, he or she may compare Examples 5.1 (p.83), 5.4 (p.88), and 6.1 (p.115). 


'4 Interestingly, a similar situation arose in physics ten years before. To explain the consequences 
of the quantization of energy and the unpredictability of events in microscopic nature, two theories 
were independently developed: matrix mechanics (Werner Heisenberg, 1925) and wave mecha- 
nics (Erwin Schrédinger, 1926). Though the two theories were completely different, Schrodinger 
proved that they are equivalent in the sense that physically they mean the same. This and their 
capability of accurately explaining and predicting physical phenomena strengthened belief in the 
quantum explanation of microscopic nature. 
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If the Computability Thesis is in truth correct, then, in principle, this cannot be 
proved. (One should give rigorous arguments that an informal notion is equivalent 
to another, formal one—but this was the very goal of the formalization described 
above.) In contrast, if the thesis is wrong, then there may be a proof of this. (One 
should conceive an intuitively computable function and prove that the function is 
not Turing-computable.) However, until now no one has found such a proof. Conse- 
quently, most researchers believe that the thesis holds.!° 

The Computability Thesis proclaimed the following formalization of intuitive 
basic concepts of computing: 


Formalization. Basic intuitive concepts of computing are formalized as follows: 


“algorithm” <—> Turing program 
“computation” —> execution of a Turing program on a Turing machine 
“computable” function —> Turing-computable total function 


NB The Computability Thesis established a bridge between the intuitive concepts 
of “algorithm”, “computation”, and “computability” on the one hand, and their 
formal counterparts defined by models of computation on the other. In this way it 
finally opened the door to a mathematical treatment of these intuitive concepts. 


The situation after the acceptance of the Computability Thesis is shown in Fig. 5.12. 


"Computable" 
function 


"Algorithm" 


"Computation" 


CT : orf orf 
Turing Turing Turing 
Church Godel- Church Godel- Church Godel- 
Algorithm, {}Kleene Computation, {}Kleene Computable ‘Kleene 
as formalized as formalized function, 
by: by: as formalized 
ost . ‘arkov OSi = ‘arkKov ‘ost é arKov 
P y Mark Post y Mark P by Mark 
Herbrand-Godel Herbrand-Godel Herbrand-Godel 


Fig. 5.12 Equivalent formalizations of intuitive concepts. By the Computability Thesis (CT), they 
also adequately capture the intuitive concepts 


The reader will find an extended treatise of the Computability Thesis in Chap. 16. 


'5 To this day, several new models of computation have been proposed and proved to be equivalent 
to the Turing machine. Some of the notable ones are the register machine, which, in various variants 
such as RAM and RASP, epitomizes modern computers (we will describe RAM in Sect. 6.2.7); the 
cellular automaton (von Neumann, Conway, Wolfram), which is inspired by the development of 
artificial organisms; and DNA computing (Adleman), which uses DNA molecules to compute. 
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At last we can refine Definition 1.1 and Fig. 1.2 (p. 4) to Definition 5.1 and Fig. 5.13. 


Turing program 


Fig. 5.13 A Turing program 


directs a Turing machine to 
compute the solution to the 
problem 


Turing machine 


Definition 5.1. (Algorithm Formally) The algorithm for solving a problem 
is a Turing program that leads a Turing machine from the input data of the 
problem to the corresponding solution. 


NB Since the concepts of “algorithm” and “computation” are now formalized, we no longer need 
to use quotation marks to distinguish between their intuitive and formal meanings. In contrast, with 
the concept of “computable” function we will be able to do this only after we have clarified which 
functions (total or non-total too) we must talk about. This is the subject of the next two subsections. 


5.3.3 Difficulties with Total Functions 


Recall that in the search for a definition of “computable” functions some researchers 
pragmatically focused on total numerical functions (see Sect. 5.2.1). These are the 
functions f : N* + N that map every k-tuple of natural numbers into a natural num- 
ber. However, it became evident that the restriction to total functions was too severe. 
There were several reasons for this. 


1. The Notion of Totality. Focusing on total functions tacitly assumed that we can 
decide, for arbitrary f, whether f is a total function. This would be necessary, 
for example, after each application of the -operation in U-recursive function 
construction (see Box 5.1, p. 82). 

But finding out whether /f is total may not be a finite process; in the extreme 
case, we must check individually, for each k-tuple in N‘, whether f is defined. 
(Only later, in Sect. 9.4, will we be able to prove this. For starters, see Box 5.4 
for a function which is potentially of this kind.) 

Unfortunately, this meant that the formalization of the concept of “com- 
putable” function had a serious weak point: it was founded on a notion (totality) 
that is disputable in view of the finitist philosophy of mathematics. 
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Box 5.4 (Goldbach’s Conjecture). 


In 1742, Goldbach!® proposed the following conjecture G: 
G= “Every even integer greater than 2 is the sum of two primes.” 


For example, 4=2+2; 6=3+3; 8=3+5; 10=3+7. But Goldbach did not prove G. What 

is more, in spite of many attempts of a number of prominent mathematicians, to this date, the 

conjecture remains an open problem. In particular, no pattern was found such that, for every 

natural n, 4+ 2n = p(n) +q(n) and p(n), q(n) are primes. (See discussion in Box 2.5, p. 21.) 
Let us now define the Goldbach function g : N > N as follows: 


(n) at J 1, if 4+ 2n is the sum of two primes; 
~~ ) undefined, otherwise. 


Is g total? This question is equivalent to the question of whether G holds. Yet, all attempts to 
answer either of the questions have failed. What is worse, there is a possibility that G is one of 
the undecidable Zruths of arithmetic, whose existence was proved by Gédel (see Sect. 4.2.3). 

Does this mean that, lacking any pattern 4+ 2n = p(n) +q(n), the only way of finding out 
whether g is total is by checking, for each natural n individually, whether 4+ 2n is the sum of 
two primes? Indeed, in 2012, G was verified by computers for n up to 4- 10!8. However, if g is 
in truth total, this is not a finite process, and will never return the answer “g is total.” In sum: 
For certain functions we might not be able to decide whether or not they are total. 


2. Diagonalization. An even more serious consequence of the restriction to total 
functions was discovered by a method called diagonalization. 

By the Computability Thesis—as stated on p. 98—every intuitively computa- 
ble function is L-recursive total (see p. 82). But is it really so? It was soon noticed 
that there are countably infinitely many L-recursive functions (as many as their 
constructions), whereas it was known that there are uncountably many numerical 
functions (see Appendix A, p. 368). So there are numerical functions that are not 
-recursive—and a fortiori not [U-recursive total. But the question was: Is there 
an intuitively computable numerical function that is not U-recursive total? If there 
is, how do we find it when all intuitively computable functions that researchers 
have managed to conceive have turned out to be [l-recursive total? 

Success came with a method called diagonalization. We will describe this 
method in depth later, in Sect. 9.1, but we can still briefly describe the idea of the 
construction. Using diagonalization, a certain numerical function g was defined 
in a somewhat unusual way (typical of diagonalization), but it was still evident 
how one could effectively calculate its values. So, g was intuitively computable. 
But the definition of g was such that it implied a contradiction if g was supposed 
to be y-recursive and total. Hence, g could not be both [1-recursive and total! 

It turned out that to prevent the contradiction it sufficed to omit only the suppo- 
sition that g was total. So, g was both L-recursive non-total and intuitively com- 
putable! This was an indication that the search for a formalization of intuitively 
computable functions should be extended to all functions, total and non-total. 


16 Christian Goldbach, 1690-1764, German mathematician. 
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Let us describe what has been said in more detail: 


a. Constructions of [1-recursive total functions are finite sequences of symbols 
over a finite alphabet. The set of all such constructions is well-ordered in 
shortlex order.‘’ Thus, given arbitrary n € N, the nth construction is precisely 
defined (in shortlex order). Let f,, denote the -recursive total function whose 
construction is nth in this order. 


b. Define a function g : N‘+! — N as follows: 


g(n,a1,...,ax) = fa(ai,...,az) +1. 


c. The function g is intuitively computable! Namely, the intuitive algorithm for 


calculating its values for arbitrary (a1,...,a,) € N* is straightforward: 
e find the nth construction; then 
e use the construction to calculate f,(a1,...,a,); and then 


e add | to the result. 


d. Is the function g u-recursive and total? Suppose it is. Then, there is a construc- 
tion of g, so g is one of the functions f), fo,..., i.e., fn =g for some m € N. 


e. Let us now focus on the value g(m,m,...,m). The definition of g gives 
g(m,m,...,m) = f(m,...,m) +1. On the other hand, we have (by d above) 
fmn(m,...,m)=g(m,m,...,m). From these two equations we obtain 


g(m,m,...,m)=g(m,m,...,m)+1 


and then 0 = 1, acontradiction. Thus, the supposition that g is [-recursive and 
total implies a contradiction. In sum, g cannot be both \t-recursive and total. 


The conclusion is inescapable: There are intuitively computable numerical 
functions that are not both \\-recursive and total! At first, researchers thought that 
the definition of -recursive functions (see Box 5.1, p. 82) should be extended by 
additional initial functions and/or rules of construction. But they realized that 
this would not prevent the contradiction: Any extended definition of U-recursive 
functions that would be used to construct only total u-recursive functions would 
lead in the same way as above to the function g and the contradiction 0 = 1. 

Did this refute the Computability Thesis? Fortunately not so; the thesis only 
had to be slightly adapted. Namely, it was noticed that no contradiction would 
have occurred if the construction of all (total and non-total) L1-recursive functions 
had been allowed. Why? If fi, /2,... was the sequence of all U-recursive func- 
tions, then g = fj, could be non-total. Since the value g(m,m,...,m) could be 
undefined, the equation g(m,m,...,m)=g(m,m,...,m) +1 would not inevitably 
be contradictory, as undefined plus one is still undefined. 


'7 Sort the constructions by increasing length, and those of the same length lexicographically. 
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3. Experience. Certain well-known numerical functions were not total. For exam- 
ple, rem : N? + N, defined by rem(m,n) = “remainder from dividing m by n.” 
This function is not total because it is not defined for pairs (m,n) € N x {0}. 
Nevertheless, rem could be treated as intuitively computable because of the fol- 
lowing algorithm: If n > 0, then compute rem(m,n) and halt; otherwise, return 
a warning (e.g., —1 ¢ N). Notice that the algorithm returns the value whenever 
rem is defined; otherwise, it warns that rem is not defined. 


4. Computable Total Extensions. We can view the algorithm in the previous 
paragraph as computing the values of some other, but total function rem*, defined 
by 

away rem(m,n) if (m,n) € N x (N— {0}); 
—1 otherwise. 


The function rem* has the same values as rem, whenever rem is defined. We say 
that rem* is a total extension of rem to N x N. Now, if every intuitively com- 
putable partial function had a “computable” total extension, then there would be 
no need to consider partial functions in order to study “computability”. How- 
ever, we will see that this is not the case: There are intuitively computable partial 
functions that have no “computable” total extensions. This is yet another reason 
for the introduction of partial functions in the study of “computable” functions. 


5.3.4 Generalization to Partial Functions 


The difficulties described in the previous section indicated that the definition of a 
“computable” function should be founded on partial functions instead of only total 
functions. So, let us recall the basic facts about partial functions and introduce some 
useful notation. We will be using Greek letters to denote partial functions, e.g., @. 


Definition 5.2. (Partial Function) We say that g : A — Bisa partial function 
if @ may be undefined for some elements of A. (See Fig. 5.14.) If @ is defined 
for a € A, we write 


p(a)l; 


otherwise we write @(a)+. The set of all the elements of A for which @ is 
defined is the domain of @, denoted by 


dom(@). 


Hence dom(@) = {a€ A | y(a),}. Thus, for partial @ we have dom(@) C A. 
In the special case when dom(@) = A, we omit the adjective partial and say 
that @ is a total function (or just a function). When it is clear that a function 
is total, we denote it by a Latin letter, e.g., f,g. 
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The expression 

pla)l=b 
says that @ is defined for a € A and its value is b. The set of all elements of B 
that are @-images of elements of A is the range of @, denoted by 


mg(@). 


Hence mg(9) = {bE B| aac A: p(a)| = b}. The function @ is surjective if 
rng(@) = B, and it is injective if different elements of dom(@) are mapped 
into different elements of rng(@). Partial functions g: A Bandy: A> 6 
are said to be equal, and denoted by 


ery 


if they have the same domains and the same values; that is, for every x € A it 
holds that p(x), <=> w(x)) and (x)= > (x) = w(x). 


Fig. 5.14 Partial function 
@: A— B is defined on the 
set dom(@) C A 


The improved definition of an intuitively computable function (the definition that 
would consider non-total functions in addition to total ones) was expected to retain 
the intuitive appeal of the previous definition (see p. 98). Fortunately, this was not 
difficult to achieve. One had only to take into account all the possible outcomes of 
the calculating function’s values for arguments where the function is not defined. 

So let gp: A> B bea partial function, a € A, and suppose that @(a)t. There are 
two possible outcomes when an attempt is made to calculate @(a): 


1. the computation halts and returns a nonsensical result (not belonging to mg(@)); 
2. the computation does not halt. 


In the first outcome, the nonsensical result is also the signal (i.e., warning) that @ is 
not defined for a. The second outcome is the trying one: Neither do we receive the 
result nor do we receive the warning that @(a)t. As long as the computation goes 
on, we can only wait and hope that it will soon come—not knowing that all is in 
vain (Fig. 5.15). The situation is even worse: We will prove later, in Sect. 8.2, that 
there is no general way to find out whether all is in vain. In other words, we can 
never find out that we are victims of the second outcome. 
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a To wait, or not to wait: * 
\,_ that is the question. 


same ee am ana lana al Rie eae aa iee - 


| 


Fig. 5.15 What does one do if a computation on a Turing machine is still going on? 


Since deciding which of the two outcomes will take place is generally impossible, 
we will not try to distinguish between them. We will not concern ourselves with 
what is going on when an attempt is made to compute an undefined function value. 
We will be satisfied with a new definition, which, in essence, says that 


A partial function is “computable” if there is an algorithm 
that can compute its value whenever the function is defined. 


When it is known that such a function is total, we will drop the adjective partial. We 
can now give a formalization of the concept of “computable” partial function. 


” 


Formalization. (cont’d from p. 98) The intuitive concept of “computable 
partial function 9 : A > B is formalized as follows: 


@ is “computable” if there exists a TM that can compute the value 
Q(x) for any x € dom(@) 
and dom(@) = A; 
@ is partial “computable” if there exists a TM that can compute the value 
Q(x) for any x € dom(@); 
@ is “incomputable” if there is no TM that can compute the value 
Q(x) for any x € dom(@). 


Note that Fig. 5.12 (p. 98) is valid for the new definition of “computable” functions. 


NB Since the concept of a “computable” function is now formalized, we will no longer use 
quotation marks to distinguish between its intuitive and formal meanings. From now on, when we 
will say that a function is computable, it will be tacitly understood that it is total (by definition). 
And when we say that a function is partial computable (or p.c. for short), we will be aware of the 


fact that the function may or may not be total. 
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Remarks. 1) In the past, the naming of computable and partial computable functions was not 
uniform. In 1996, Soare suggested to modernize and unify the terminology. According to this, 
we use the terms computable function (instead of the older term recursive function) and partial 
computable function (instead of the older term partial recursive function). The reasons for the 
unification will be described on p. 153. 2) Above, we have defined the concept of an incomputable 
function. But, do such functions really exist? A clue that they do exist was given in Sect. 5.3.3, 
where it was noticed that numerical functions outnumber the constructions of [-recursive (i.e., p.c.) 
functions. We will be able to construct a particular such function later (see Sects. 8.2.3 and 8.3.1). 


A Generalization 


Later, a slight generalization of the above formalization will ease our expression. Let 
o:A—B be a partial function and S C A an arbitrary set. Observe that even if 
is incomputable, there may still exist a Turing machine that is capable of computing 
the value (x) for arbitrary x € SMdom(@). (See Fig. 5.16.) In this case, we will 
say that @ is partial computable on the set S.'® If, in addition, S C dom(@), then 
we will say that @ is computable on the set S.'° Here is the official definition. 


p 


Fig. 5.16 9: A> Bis Gp 
partial computable (p.c.) on 
S C A if it can be computed 


everywhere on SM dom(@) 


Definition 5.3. (Computability on a Set) Let g : A — B be a partial function 
and S C A. We say that: 


@ is computable on S if there exists a TM that can compute the value 
9(x) for any x € SNdom(@) 
and S C dom(@); 
@ is partial computable on S if there exists a TM that can compute the value 
(x) for any x € SNdom(@); 
@ is incomputable on S if there is no TM that can compute the value 
Q(x) for any x € SN dom(q). 


If we take S = A, the definition transforms into the above formalization. 


'8 Equivalently, @ : A > B is p.c. on the set S C A if the restriction @|g is a p.c. function. 
'9 Equivalently, @ : A — B is computable on the set S C dom(@) if @|s is a computable function. 
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5.3.5 Applications of the Thesis 


The Computability Thesis (CT) is useful in proving the existence or non-existence 
of certain “algorithms”. Such proofs are called proofs by CT. There are two cases: 


1. Suppose we want to prove that a given function @ is p.c. We might try to construct 
a TM and prove that it is capable of computing the values of @. However, this 
approach is cumbersome and prone to mistakes. Instead, we can do the following: 


a. informally describe an “algorithm” that “computes” the values of @; 
b. refer to the CT (saying: By CT the “algorithm” can be replaced by some Tur- 
ing program; hence @ is p.c.). 


2. Suppose we want to prove that a function @ is “incomputable”’. To do this, we 


a. prove that @ is not Turing-computable (i.e., there exists no TM for computing 
the values of ¢); 
b. refer to the CT (saying: Then, by CT, @ is “incomputable’’). 


5.4 Chapter Summary 


Hilbert’s Program left open the Entscheidungsproblem, the problem that was call- 
ing for an algorithm that would, for any mathematical formula, decide whether the 
formula can be derived. 

Soon it became clear that the problem could not be solved unless the intuitive, 
loose definition of the concept of the algorithm was replaced by a rigorous, formal 
definition. Such a definition, called the model of computation, should characterize 
the notions of “algorithm”, “computation”, and “computable”. 

In 1930, the search for an appropriate model of computation started. Different 
ideas arose and resulted in several totally different models of computation: the u- 
recursive functions, (general) recursive functions, A -definable functions, the Turing 
machine, the Post machine, and Markov algorithms. Although diverse, the models 
shared two important properties, namely that they were reasonable and they fulfilled 
the Effectiveness Requirement. 

The Turing machine was accepted by many as the most appropriate model of 
computation. Surprisingly, it turned out that all the models are equivalent in the 
sense that what can be computed by one can also be computed by the others. 

Finally, the Computability Thesis equated the informally defined concepts of “al- 
gorithm”, “computation”, and “computable” with the counterparts that were for- 
mally defined by the models of computation. In effect, the thesis declared that all 
these models of computation also fulfill the Completeness Requirement. It was also 
found that the definition of a “computable” function must be founded on partial 
functions instead of only on total functions. 

All in all, the Computability Thesis made it possible to mathematically treat the 
intuitive concepts of computation. 
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Problems 

5.1. Prove that these functions are primitive recursive: 
(a) const; (m1, ...,7x) 2 i, for j >Oandk > 1; 
(b) add(m,n) = m+n; 
(c) mult(m,n) = mn; 


(d) power(m,n) =m"; 
def 


(e) fact(n) =n}; 


m 


(f) tower(m,n) =m" \n levels; 


(g) minus(m,n) = m—n = eee . 
0 otherwise. 
: def. my. 
(h) div(m,n) =m+n=|*]; 


(i) floorlog(n) = [log, n|; 


k times 


(j) log*(n) = the smallest k such that log(log(--- (log(n))---)) < 1; 
(k) gced(m,n) * greatest common divisor of m and n; 

() Icm(m,n) © least common multiple of m and n; 

(m) prime(n) © the nth prime number; 

(n) x(x) = the number of primes not exceeding x; 


(0) $(n) © the number of positive integers that are < n and relatively prime to n (Euler funct.); 


(p) max*(n1,...,ny) = max{nj,...,nj}, for k > 1; 
(q) min*(n1,...,¢) = min{ny,...,ng}, for k > 1: 


ae JO ifx > 1; 
© nets Feo: 


(s) and(x,y) = 


ac Jl ifx>lAy>1; 
0 otherwise. 
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at Jl ifx>1Vy21; 
t) or(x,y) = 
(y (9) is otherwise. 


le ifx > 1; 
(u) if-then-else(x, y, z) = 2 ae : 
z otherwise. 


at JL ifx=y; 
(¥) eq(ty) = f otherwise. 


ap Jl ifx>y; 
(w) aot ifx<y. 


at Jl ifxey; 
(x) eet) Pes, 


e Jl ifx<y; 
Is(x,y) = 
wr masyeft tas 


w fl ifx<y; 
1 : def SS 
ey Tea) {0 ifx>y. 


n times 


5.2. Prove: If f: NN is primitive recursive, then so is g(n,m) = f”)(m) = f(f(---(f(m))---)). 
5.3. Prove: Every primitive recursive function is total. 
5.4. Prove: Every u-recursive function can be obtained from the initial functions €,o, ak by 


a finite number of compositions and primitive recursions and at most one U-operation. 


Definition 5.4. (Ackermann Function) A version A : N? > N of the Ackermann function 
is defined as follows: 


A(0,n)= n+1 (1) 
A(m+1,0) = A(m,1) (2) 
A(m+1,n+1) = A(m,A(m-+ 1,n)) (3) 


Remark. What is the intuition behind the Ackermann function? Imagine a sequence of binary 
operations 01, 02, 03,...0n N2, where each operation is defined by the preceding one as follows: 
Given arbitrary x,y € N, the value o;(x,y) is obtained by applying x to itself y times using og—1; 
in particular, 0; applies x to itself y times using the successor function, i.e., 0\(x,y) =x+y. 


y times y times 
; — —— 
The first operations are oj(x,y) =x+14+...41=x+4+y, oo(x,y) =xtx+...44 =x-y, 
03(X,y) =X+xX+ 2.6K =x, and 04(x,y) =xX°(X(...°% -" 
3(%,y) =x : 4 (x,y) = x°(x"(...%x)) 
y times y times 


As k increases, the values 0,(x,y) grow extremely fast. We can view k as the third variable and 


define a new function ack : N° + N by ack(k,x,y) © og (x,y). This is the so-called Ackermann 
generalized exponential. The function A in Definition 5.4 can be obtained from ack. 
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5.5. Let A be the Ackermann function as defined above. 


(a) Prove: A is intuitively computable. 


[Hint. A(0,n) is the successor function, hence intuitively computable. Suppose that, for all n, 
A(m,n) is intuitively computable. To see that A(m-+ 1,1) is intuitively computable 
we repetitively apply (3) (to obtain A(m+ 1,0)) and then (2) (to decrease m+ 1).] 


(b) Prove: A is a [-recursive function. 
(c) Try to compute A(k,k) for k =0,1,2,3. 


(d) The function A grows faster than any primitive recursive function in the following sense: 
For every primitive recursive f(), there is an no € N such that f(n) <A(n,n) for all n>no. 
Can you prove that? 

(Remark. This is why A is not primitive recursive.) 
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@ 
Chapter 6 Ritiem 
The Turing Machine 


A machine is a mechanically, electrically, or electronically 
operated device for performing a task. A program is a sequence 
of coded instructions that can be inserted into a machine to 
control its operation. 


Abstract The Turing machine convincingly formalized the concepts of “algorithm”, 
“computation”, and “computable”. It convinced researchers by its simplicity, gen- 
erality, mechanical operation, and resemblance to human activity when solving 
computational problems, and by Turing’s reasoning and analysis of “computable” 
functions and his argumentation that partial “computable” functions are exactly 
Turing-computable functions. Turing considered several variants that are general- 
izations of the basic model of his machine. But he also proved that they add noth- 
ing to the computational power of the basic model. This strengthened belief in the 
Turing machine as an appropriate model of computation. Turing machines can be 
encoded and consequently enumerated. This enabled the construction of the univer- 
sal Turing machine, which is capable of computing anything that can be computed 
by any other Turing machine. This seminal discovery laid the theoretical grounds for 
several all-important practical consequences, the general-purpose computer and the 
operating system being the most notable. The Turing machine is a versatile model 
of computation: It can be used to compute values of a function, or to generate ele- 
ments of a set, or to decide about the membership of an object in a set. The last led 
to the notions of decidable and semi-decidable sets, which would later prove to be 
very important in solving general computational problems. 


6.1 Turing Machine 


Most researchers accepted the Turing machine (TM) as the most appropriate model 
of computation. We described the Turing machine briefly in Sect. 5.2.2. In this sec- 
tion we will go into detail. First, we will describe the basic model of the TM. We 
will then introduce several other variants that are generalizations of the basic model. 
Finally, we will prove that, from the viewpoint of general computability, they add 
nothing to the computational power of the basic model. This will strengthen our 
belief in the basic Turing machine as a simple yet highly powerful model of com- 
putation. 
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Fig. 6.1 Alan Turing 
(Courtesy: See Preface) 


6.1.1 Basic Model 


Definition 6.1. (Turing Machine) The basic variant of the Turing machine 
has the following components (Fig. 6.2): a control unit containing a Turing 
program; a tape consisting of cells; and a movable window over the tape, 
which is connected to the control unit. (See also the details below.) 


Fig. 6.2 Turing machine ; 
(basic model) window 


The details are: 


1. The tape is used for writing and reading the input data, intermediate data, and 
output data (results). It is divided into equally sized cells, and is potentially infi- 
nite in one direction (i.e., whenever needed, it can be extended in that direction 
with a finite number of cells). 


In each cell there is a tape symbol belonging to a finite tape alphabet T' = 
{z1,---,Z:},¢ > 3. The symbol z, is special, for it indicates that a cell is empty; for 
this reason it is denoted by _ and called the empty space. In addition to _. there 
are at least two! additional symbols: 0 and 1. We will take z} = 0 and z2 = 1. 


' The reasons for at least two additional symbols are mostly practical (leaving out of consideration 
the non-polynomial relation between the lengths of unary and binary representation of data, which 
is important in Computational Complexity Theory). Only once, in Sect. 8.3.1, we will come across 
Turing machines that need just one additional tape symbol (which will be the symbol 1). There, 
we will simply ignore the other additional symbol. 
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The input data is contained in the input word. This is a word over some finite 
input alphabet © such that {0,1} C © CI—{v4}. Initially, all the cells are empty 
(i.e., each contains _s) except for the leftmost cells, which contain the input word. 


The control unit is always in some state belonging to a finite set of states 
QO = {q1,.--,4s}, where s>1. We call q; the initial state. Some states are said 
to be final; they are gathered in the set F C Q. All the other states are non-final. 
When the index of a state will be of no importance, we will use qyes and Gyo to 
refer to any final and non-final state, respectively. 


There is a program called the Turing program (TP) in the control unit. The pro- 
gram directs the components of the machine. It is characteristic of the particular 
Turing machine, that is, different TMs have different TPs. Formally, a Turing 
program is a partial function 6:0 x I > QxT ~x {Left,Right, Stay}. It is also 
called the transition function. We can view 6 as a table A = Q x I’, where the 
component A [q;,z-] = (¢j,Zw,D) if 6(qi,z-) = (¢j,Zw,D) is an instruction of 6, 
and A[q;,z,] = 0 if 5(qi,z,-)t (see Fig. 6.3). Without loss of generality we assume 
that 6(qno,z) | for some z€ I, and 6(qyes,z)t for all z € I’. That is, there is 
always a transition from a non-final state, and none from a final state. 


The window can move over any single cell, thus making the cell accessible to 
the control unit. The control unit can then read a symbol from the cell under the 
window, and write a symbol to the cell, replacing the previous symbol. In one 
step, the window can only move to a neighboring cell. 


Fig. 6.3 Turing program 

6 represented as a table 

A. Instruction 6(q;,z,) = 
(qj,Zw,D) is described by the 
component A[q;,z;| 


2. Before the Turing machine is started, the following must take place: 


a. an input word is written to the beginning of the tape; 
b. the window is shifted to the beginning of the tape; 
c. the control unit is set to the initial state. 


3. From now on the Turing machine operates independently, in a mechanical step- 
wise fashion as instructed by its Turing program 6. Specifically, if the TM is in a 
state gq; € Q and it reads a symbol z,; € I’, then: 
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if qi is a final state, then the TM halts; 

else, if 5(qi,zr)t (i.e., TP has no next instruction), then the TM halts; 

else, if 3(qi,Zr)l = (9j,2w,D), then the TM does the following: 

a. changes the state to gj; 

b. writes z,, through the window; 

c. moves the window to the next cell in direction D € {Left, Right}, or leaves 
the window where it is (D = Stay). 


Formally, a Turing machine is a seven-tuple T = (Q,2,I',6,q1,Lu,F). To fix a par- 
ticular Turing machine, we must fix Q,2,I",6, and F. 


Remarks. 1) Because 6 is a partial function, it may be undefined for certain arguments q;,z,, that 


is, 6(qi,z,) t. In other words, a partial Turing program has no instruction 6(q;,z-) = (¢j,Zw,D 
that would tell the machine what to do when in the state q; it reads the symbol z,. Thus, for such 
pairs, the machine halts. This always happens in final states (qyes) and, for some Turing programs, 
also in some non-final states (qn). 2) The interpretation of these two different ways of halting (i.e., 
what they tell us about the input word or the result) will depend on what purpose the machine will 
be used for. (We will see later that the Turing machine can be used for computing function values, 
generating sets, or recognizing t sets.) 


Computation. What does a computation on a Turing machine look like? Re- 
call (Sect. 5.2.4) that the internal configuration of an abstract computing machine 
describes all the relevant information the machine possesses at a particular step of 
the computation. We now define the internal configuration of a Turing machine. 


Definition 6.2. (Internal Configuration) Let T be a basic TM and w an arbitrary 
input word. Start T on w. The internal configuration of T after a finite 
number of computational steps is the word uq;v, where 


qi is the current state of T; 
uv € I™* are the contents of T’s tape up to (a) the rightmost non-blank 
symbol or (b) the symbol to the left of the window, whichever is rightmost. 
We assume that v ¥ € in the case (a), and v = € in the case (b). 

e T is scanning the leftmost symbol of v in the case (a), and the symbol in 
the case (b). 


Not every sequence can be an internal configuration of T (given input w). Clearly, 
the configuration prior to the first step of the computation is gw; we call it the initial 
configuration. After that, only sequences uq;v that can be reached from the initial 
configuration by executing the program 6 are internal configurations of T. So, if 
uq;v is an internal configuration, then the next internal configuration can easily be 
constructed using the instruction 6(q;,z,), where z, is the scanned symbol. 

The computation of T on w is represented by a sequence of internal configura- 
tions starting with the initial configuration. Just as the computation may not halt, the 
sequence may also be infinite. (We will use internal configurations in Theorem 9.2.) 
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Example 6.1. (Addition on TM) Let us construct a Turing machine that transforms an input word 
1™01"2 into 1'*"2, where n;,n2 are natural numbers. For example, 111011 is to be transformed 
into 11111. Note that the input word can be interpreted as consisting of two unary-encoded nat- 
ural numbers 71,2 and the resulting word interpreted as containing their sum. Thus, the Turing 
machine is to compute the function sum(n; +-n2) = ny +no. 

First, we give an intuitive description of the Turing program. If the first symbol of the input 
word is 1, then the machine deletes it (instruction 1), and then moves the window to the right 
over all the symbols | (instruction 2) until the symbol 0 is read. The machine then substitutes this 
symbol with 1 and halts (instruction 3). However, if the first symbol of the input word is 0, then 
the machine deletes it and halts (instruction 4). 

Formally, the Turing machine is T = (Q,2,I",5,q1,.4, {g3}), where: 


© O={41,9.93}5 /Is = 3; q1 is the initial state, g3 is the final state (hence F = {g3}); 
e Y={0,1}; 
e TF ={0,1,.4}; iene 
e the Turing program consists of the following instructions: 

1. 8(q1,1) = (q2,Lu, Right); 

2. 8(q2,1) = (42, 1,Right); 

3. 5(q2,0) = (q3, 1, Stay); 

4. 8(q1,0) = (q3,L4, Stay). 
The state q3 is final, because 5(q3,z) t for all z € ’ (there are no instructions 6(q3,z) =...). 


For the input word 111011, the computation is illustrated in Fig. 6.4. 


Fig. 6.4 Computing 3+ 2 ona TM. Over the arrows are written the numbers of applied instructions 


The corresponding sequence of internal configurations is: 


qi 111011 — g211011 — 1q21011 > 11q2011 > 1193111. 


If the input was 011, instruction 4 would execute, leaving the result 11. For the input 1110, the 
computation would proceed as in the figure above; only the result would be 111. 
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Example 6.2. (Addition on Another TM) Let us solve the same problem with another Turing 
machine that has a different Turing program. First, the window is moved to the right until LJ is 
reached. Then the window is moved to the left (i.e., to the last symbol of the input word) and 
the symbol is deleted. If the deleted symbol is 0, the machine halts. Otherwise, the window keeps 
moving to the left and upon reading 0 the symbol | is written and the machine halts. 

The Turing machine is T’ = (Q,2,I",5,q1,-,{qs5}), where: 


© O={41,92,93,94,95}5 /Is = 5; qi is the initial and qs the final state (F = {qs5}); 
e 5={0,1}; 
e F={0,1,4}; Mt=3; 
e Turing program: 

1. 8(q1,1) = (q2, 1, Right) 6. 5(q3,0) = (gs,Lu, Stay) 

2. 8(q1,0) = (q2,0, Right) 7. 6(q3,1) = (g4,L, Left) 

3. 8(g2,1) = (q2,1, Right) 8. 5(q4,1) = (ga, 1,Left) 

4. 8(q2,0) = (q2,0, Right) 9. 5(q4,0) = (qs, 1, Stay) 

5. 6(g2,L4) = (43, W, Left) 

The are no instructions of the form 5(q5,z) =..., so the state qs is final. 


For the input word 111011, the computation is illustrated in Fig. 6.5. 


* 3: 7. 
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Fig. 6.5 Computing 3+ 2 on a different TM. The numbers of applied instructions are on the arrows 


The corresponding sequence of internal configurations is: 


gi 111011 — 1g211011 - 1121011 > 111q2011 > 
= 1110go11 + 11101931 4 1110119 3 111019q31 > 
= 1110g41 3 1119401 > 1119511 


6.1 Turing Machine 117 


6.1.2 Generalized Models 


Turing considered different variants of the basic model; each is a generalization of 
the basic model in some respect. The variants differ from the basic model in their 
external configurations. For example, finite memory can be added to the control 
unit; the tape can be divided into parallel tracks, or it can become unbounded in both 
directions, or it can even be multi-dimensional; additional tapes can be introduced; 
and nondeterministic instructions can be allowed in Turing programs. The variants 
V of the basic model are: 


e Finite-Storage TM. This variant V has in its control unit a finite storage capable 
of memorizing k > | tape symbols and using them during the computation. The 
Turing program is formally éy :Q@x I x‘ + QxT ~x {Left, Right, Stay} x D*. 


Fig. 6.6 Finite-storage TM 
can store k tape symbols == 


e Multi-track TM. This variant V has the tape divided into tk > 2 tracks. On each 
track there are symbols from the alphabet I”. The window displays ¢k-tuples of 
symbols, one symbol for each track. Formally the Turing program is the function 
by :0xTl* + QxI x {Left, Right, Stay}. 


Fig. 6.7 Multi-track TM has 
tk tracks on its tape 


e Two-Way Unbounded TM. This variant V has the tape unbounded in both di- 
rections. Formally, the Turing program is 6y :Qx I + QxT ~x {Left, Right, Stay}. 


Fig. 6.8 Two-way TM has 
unbounded tape in both direc- 
tions Sie te 
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e Multi-tape TM. This variant V has tp > 2 unbounded tapes. Each tape has its 
own window that is independent of other windows. Formally, the Turing program 
is dy: QxT? > Qx (I x {Left, Right, Stay})”. 


Fig. 6.9 Multi-tape TM has tp 
tapes with separate windows 


e Multidimensional TM. This variant V has a d-dimensional tape, d > 2. The 
window can move in d dimensions, i.e., 2d directions L;,R1,L2,R2,...,L¢,Rq. 
The Turing program is 6y :Qx I > QxT x {L1,Ri,L2,Ro,..., Ly, Ra, Stay}. 


|z|z|z|z|2|¥1|_|_|_| 
HABSR ee 


C 


Fig. 6.10 Multidimensional 
TM has a d-dimensional tape ora 


e Nondeterministic TM. This variant V has a Turing program dy that assigns to 
each (qj, z,) a finite set of alternative transitions { (qj, ,Zw,,D1),(@j.5Zw),D2),-.-}- 
In each (q;,z,), the machine V can miraculously pick out from the set dy (qi,z,) 
a transformation—if such exists—that can lead the remaining computation to a 
final state dyes. Accordingly, we define that the machine V accepts a given input 
word x if there exists a computation of V on x that terminates in a final state qyes; 
otherwise, the machine V immediately rejects x and halts. 


Obviously, the nondeterministic TM is not a reasonable model of computation 
because it can foretell the future computation from each of the alternative tran- 
sitions. Nevertheless, it is a very useful tool that makes it possible to define the 
minimum number of steps needed to compute the solution (when there is one). 
Again, this is important when we investigate the computational complexity of 
problem solving. For computability on unrestricted models of computation, this 
is irrelevant. So we will not be using nondeterministic Turing machines. 
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6.1.3 Equivalence of Generalized and Basic Models 


Although each generalized model V of the TM seems to be more powerful than the 
basic model 7, it is not so; T can compute anything that V can compute. We will 
prove this by describing how T can simulate V. (The other way round is obvious as 
T is a special case of V.) In what follows we describe the main ideas of the proofs 
and leave the details to the reader as an exercise. 


e Simulation of a Finite-Storage TM. Let V be a finite-storage TM with the 
storage capable of memorizing k > | tape symbols. The idea of the simulation is 
that T can memorize a finite number of tape symbols by encoding the symbols 
in the indexes of its states. Of course, this may considerably enlarge the number 
of T’s states, but recall that in the definition of the Turing machine there is no 
limitation on the number of states, as long as this number is finite. 


To implement the idea, we do the following: 1) We redefine the indexes of T’s 
states by enforcing an internal structure on each index (i.e., by assuming that 
certain data is encoded in each index). 2) Using the redefined indexes, we are able 
to describe how T’s instructions make sure that the tape symbols are memorized 
in the indexes of T’s states. 3) Finally, we show that the redefined indexes can be 
represented by natural numbers. This shows that T is still the basic model of the 
Turing machine. 


The details are as follows: 


1. Let us denote by [i,71,...,/m] any index that encodes k symbols Zn, ,..-  Zm, 
that were read from 7T’s tape (not necessarily the last-read symbols). Thus, 
the state qjj,my....,.m,] fepresents some usual T state q; as well as the contents 
Zm>+++s%m, Of V’s storage. Since at the beginning of the computation there are 
no memorized symbols, 1.e., 2m, =... = Zm = Wu, and since Ls = z,, where 
t = |I’|, we see that the initial state is qj)... 1)- 

2. The Turing program of T consists of two kinds of instructions. Instruc- 
tions of the first kind do not memorize the symbol z, that has been read 
from the current cell. Such instructions are of the form 6(4jj,m,,....m)>2r) = 
(AL j,my ....ym]1ZwsD). Note that they leave the memorized symbols Zin, ,.-.,Zm 
unchanged. In contrast, the instructions of the second kind do memorize 
the symbol z,. This symbol can replace any of the memorized symbols 
(as if z- had been written into any of the k locations of V’s storage). Let 
l, 1<@<k, denote which of the memorized symbols is to be substi- 
tuted by z,. The general form of the instruction is now 6(4jj,m,,....mj)>2r) = 
(Aj. se.smp_1,.r:mp41..my]>2w»D). After the execution of such an instruction, T 
will be in a new state that represents some usual state g; as well as the new 


memorized symbols Zn, ,.-- 5 Zimy_5ZrsZinggy r++ Zing: 
3. There are st* indexes [i,m1,...,mx], where 1 <i<s=|Q| and 1<m<t=|I| 
for €=1,...,k. We can construct a bijective function f from the set of all the 


indexes [i,m 1,...,m,] onto the set {1,2,...,st*}. (We leave the construction 
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of f as an exercise.) Using f, the states qj; m, 
usual form g F((j,m, 


us m,] Can be renamed into the 
my])> where the indexes are natural numbers. 


In summary, a finite-storage TM V can be simulated by a basic Turing machine 
T if the indexes of 7’s states are appropriately interpreted. 


e Simulation of a Multi-track TM. Let V be a tk-track TM, tk > 2. The idea of 
the simulation is that T considers tk-tuples of V’s symbols as single symbols (see 
Fig. 6.11). Let V have on the ith track symbols from Jj, i = 1,2,...,tk. Then let 
T’s tape alphabet be Ij x I5 x ... x Tix. If V has in its program dy an instruction 
Ov (GisZry 10+ +9 Sry) = (QjsZwy9+++s%Zw_»D), then let T have in its program dr the 
instruction 67 (qj, (Zr, 5+ «+5 Zrg)) = (Gj. (Zw >+++s%w_y)P). Then T simulates V. 


40mm oo poe -ZRZE-I--] 
SEE Ezz yyy fie ey. = 4 \ ze Nz | Val | eee 


Fig. 6.11 Each pair of symbols in V’s window is considered as a single symbol by T 


e Simulation of a Two-Way Unbounded TM. Let V be a two-way unbounded 
TM. Assume that, initially, V has its window positioned over the leftmost symbol 
of the input word. Denote the cell with this symbol by co. Fold the left part of the 
V’s tape (the part to the left of co) so that the cell c_; moves under co, the cell 
c_—2 under cj, and so on (see Fig. 6.12). The result can be viewed as a one-way 
two-track tape of anew TM V’, having the input word written on its upper track. 


The machine V’ can simulate V as follows: 1) Initially, V’ writes a delimiter e 
to c_;; 2) whatever V does on the right part co,c,... of its tape, V’ does on the 
upper track of its tape; and whatever V does on the /eft part ...c_2,c_1 of its 
tape, V’ does on the part c_1,c_2... on the lower track of its tape, while moving 
its window in the direction opposite to that of V; and 3) whenever V’s window 
moves from co to c_; or from c_; to cg, the machine V’ passes to the opposite 
track and moves its window one cell to the right or left, respectively. Finally, we 
know that V’ can be simulated by a basic TM T. Hence V can be simulated by T. 


Fig. 6.12 The left part of V’s tape is folded under the right one to obtain the tracks of V’’s tape 


e Simulation of a Multi-tape TM. Let V be a tp-tape TM, tp > 2. Note that after 
k > 1 steps the leftmost and rightmost window can be at most 2k + 1 cells apart. 
(The distance is maximized when, in each step, the two windows move apart by 
two cells.) Other windows are between (or as far as) the two outermost windows. 


Let us now imagine a two-way unbounded, 2tp-track Turing machine V’ (see 
Fig. 6.13). The machine V’ can simulate V as follows. Each two successive tracks 
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of V’ describe the situation on one tape of V. That is, the situation on the ith tape 
of V is described by the tracks 2i— 1 and 2i of V’. The contents of the track 2i —1 
are the same as would be the contents of the ith tape of V. Track 2i is empty, 
except for one cell containing the symbol X. The symbol X is used to mark what 
V would see on its ith tape (i.e., where the window on the ith tape would be). 


ae 


Fig. 6.13 Each tape of V is represented by two successive tracks of V’’s tape 


In one step, V would read tp symbols through its windows, write tp new sym- 
bols, move tp windows, and change the state. How can V’ simulate all of this? 
The answer is by moving its single window to and fro and changing the contents 
of its tracks until they reflect the situation after V’s step. Actually, V’ can do this 
in two sweeps, first by moving its window from the leftmost X to the rightmost 
one, and then back to the leftmost X. When, during the first sweep, V’ reads an 
X, it records the symbol above the X. In this way, V’ records all the tp symbols 
that V would read through its windows. During the second sweep, V’ uses infor- 
mation about V’s Turing program: If V’ detects an X, it substitutes the symbol 
above the X with another symbol (the same symbol that V would write on the 
corresponding tape) and, if necessary, moves the X to the neighboring cell (in the 
direction in which V would move the corresponding window). After the second 
sweep is complete, V’ changes to the new state (corresponding to V’s new state). 


Some questions are still open. How does V’ know that an outermost X has been 
reached so that the sweep is completed? The machine V’ must have a counter 
(on additional track) that tells how many Xs are still to be detected in the current 
sweep. Before a sweep starts, the counter is set to fp, and during the sweep the 
counter is decremented upon each detection of an X. When the counter reaches 0, 
the window is over an outermost X. This happens in finite time because the out- 
ermost Xs are at most 2k+ 1 cells apart. Since each single move of V can be sim- 
ulated by V’ in finite time, every computation of V can also be simulated by V’. 


The machine V’ is a two-way unbounded, multi-track TM. However, from the 
previous simulations it follows that V’ can be simulated by the basic model of 
the Turing machine 7. Hence, V can be simulated by T. 


e Simulation of a Multidimensional TM. Let V be a d-dimensional TM, d > 2. 
Let us call the minimal rectangular d-polytope containing every nonempty cell 
of V’s tape the d-box. For example, the 2-box is a rectangle (see Fig. 6.14) and 
the 3-box is a rectangular hexahedron. For simplicity we continue with d = 2. A 
2-box contains rows (of cells) of equal length (the number of cells in the row). 
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Fig. 6.14 The rows of V’s current 2-box are delimited by # on V’’s tape 


The machine V can be simulated by a two-way unbounded TM V’. The tape of V’ 
contains finitely many rows of V’s current 2-box, delimited by the symbol #. If 
V were to move its window within the current 2-box, then V’ moves its window 
either to the neighboring cell in the same row (when V would move its window 
in the same row), or to the corresponding cell in the left/right neighboring row 
(when V would move its window to the upper/lower row). If, however, V would 
move its window across the border of the current 2-box, then V’ either adds a 
new row before/after the existing rows (when V would cross the upper/lower 
border), or adds one empty cell to the beginning/end of each existing row and 
moves # as necessary (when V would cross the left/right border). We see that in 
order to extend a row by one cell, V’ has to shift a part of the contents of its tape. 
V’ can achieve this by moving the contents in a stepwise fashion, cell after cell. 
(Alternatively, V’ can be supplied with a finite storage in its control unit. Using 
this, V’ can shift the contents in a single sweep in a caterpillar-type movement.) 
The generalization to higher dimensions d is left to the reader as an exercise. 


The machine V’ is a two-way unbounded TM. As we know, V’ can be simulated 
by the basic model of the TM. 


e Simulation of a Nondeterministic TM. Let V be a nondeterministic TM with 
a Turing program dy. The instructions are of the form 


Ov (gi,Zr) = {(4j, Zw Di); (Gj:2wy,L2),- Ray (Gigs Ze» Dx) } 


where (qj, ,2w,;P1),(j.,Zw),D2),... are alternative transitions and k depends 
on i and r. Call |6y(q;,z,)| the indeterminacy of the instruction dy (q;,z,-), and 


u= maxg,-, |Oy(gi,Zr)| the indeterminacy of the program dy. We call a sequence 
of numbers i ,i2,...,i¢, 1 <i; <u, the scenario of the execution of dy. We say 
that dy executes alongside the scenario i,,iz,...,i, if the first instruction makes 
the i, th transition, the second instruction makes the ith transition, and so on. 


Let us now describe a three-tape (deterministic) TM V’ that will simulate the 
machine V. The first tape contains the input word x (the same as V). The second 
tape is used for systematic generation of the scenarios of the execution of dy (in 
shortlex order). The third tape is used to simulate the execution of dy on x as if 
dy executed alongside the current scenario. 
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The machine V’ operates as follows: 1) V’ generates the next scenario on the 
second tape, clears the contents of the third tape, and copies x from the first tape 
to the third one. 2) On the third tape, V’ simulates the execution of dy alongside 
the current scenario. If the simulation terminates in dyes (1.e., V would halt on x 
iN yes), then V’ accepts x and halt (as V would accept x and halt). If, during the 
simulation, the scenario requires a nonexisting transition, or the simulation ter- 
minates in a gno state, then V’ returns to the first step. 3) If none of the generated 
scenarios terminates in Gyes, V’ rejects x and halts (as V would reject x and halt). 


If V accepts x, i.e., halts on x in qyes, then it does so after executing a certain 
miraculously guessed scenario. But V’ eventually generates this scenario and 
simulates V according to it. So, V’ too halts on x in gyes and accepts x. 


It remains to simulate V’ with T. But we have already proved that this is possible. 


Remark. There is another useful view of the simulation. With each Turing program 6y and 
input word x we can associate a directed tree, called the decision tree of dy for the input x. 
Vertices of the tree are the pairs (q;,z,-) € Q@ x I’ and there is an arc (qi,z-) ~» (qj,Zw) iff there 
is a transition from (gi,z,) to (gj,Zw), Le. (gj,zw,D) € dv (qi,z-) for some D. The root of the 
tree is the vertex (qi,a), where a is the first symbol of x. A scenario is a path from the root to 
a vertex of the tree. During the execution of dy, the nondeterministic TM V starts at the root 
and then miraculously picks out, at each vertex, an arc that is on some path to some final vertex 
(dyes,Z). If there is no such arc, the machine miraculously detects that and immediately halts. 
In contrast, the simulator V' has no miraculous capabilities, so it must systematically generate 
and check scenarios until one ending in a vertex (qyes,z) is found, or no such scenario exists. 
If there are finitely many scenarios, the simulation terminates . 


6.1.4 Reduced Model 


In Definition 6.1 (p. 112) of the basic TM, certain parameters are fixed; e.g., qi 
denotes the initial state; z),z2,z; denote the symbols 0,1,L1, respectively; and the 
tape is a one-way unbounded single-track tape. We could also fix other parameters, 
e.g. ©, I, and F (with the exception of 6 and Q, because fixing either of these would 
result in a finite number of different Turing programs). We say that, by fixing these 
parameters, the basic TMs are reduced. 

But why do that? The answer is that reduction simplifies many things, because 
reduced TMs differ only in their ds and Qs, i.e., in their Turing programs. So let us 
fix I and &, while fulfilling the condition {0, 1} C 2 C '—{v4} from Definition 6.1. 
We choose the simplest option, Y = {0,1} and I = {0,1,_u}. No generality is lost 
by doing so, because any other 2 and I” can be encoded by Os and Is. In addition, 
by merging the final states into one, say g2, we can fix the set of final states to a 
singleton F = {q2}. Then, a reduced Turing machine is a seven-tuple 


T= (Q, {0, 1}, {0, 1,.4},6,q1,L4, {g2}). 


To obtain a particular reduced Turing machine, we must choose only Q and 6. 
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6.1.5 Equivalence of Reduced and Basic Models 


Is the reduced model of the TM less powerful than the basic one? The answer is no. 
The two models are equivalent, as they can simulate each other. Let us describe this. 

Given an arbitrary reduced TM R = (Qr, {0,1}, {0,1,U}, bz, ¢12,4, {G2R}), 
there is a basic TM T = (Qr7,21,I7,67,q17,1,Fr) capable of simulating R. Just 
take Or := Qr, Xr := {0,1}, Ir := {0,1 cu}, Or := 6p, gir :=air, and Fr := {qor}- 

Conversely, let T = (Q7, 27,17, 67,917,-1,F7) be an arbitrary basic TM. We can 
describe a finite-storage TM S = (Qs, {0,1}, {0, 1,4}, 65, 915,45, {¢25}) that simu- 
lates T on an arbitrary input w € X;. Since S does not recognize T’s tape symbols, 
we assume that w is binary-encoded, i.e., each symbol of w is replaced by its binary 
code. Let n = [logy |I7|] be the length of this code. Thus, the binary input to S is of 
length n|w|. The machine S has storage in its control unit to record the code of T’s 
current symbol (i.e., the symbol that would be scanned by T at that time). In addi- 
tion, S’s control unit has storage to record T’s current state (i.e., the state in which 
T would be at that time). This storage is initiated to qi7, T’s initial state. Now S can 
start simulating T. It repeatedly does the following: (1) It reads and memorizes n 
consecutive symbols from its tape. (2) Suppose that T’s current state is g;r and the 
memorized symbols encode the symbol z, € Ir. If 67 (gir,z-) = (Gjr.Zw,Dr) is an 
instruction of 67, then S writes the code of zy into n cells of its tape (thus replacing 
the code of z, with the code of z,,), memorizes qj7, and moves the window to the 
beginning of the neighboring group of n cells (if Dr is Left or Right) or to the begin- 
ning of the current group of n cells (if Dr = Stay). As we have seen in Sect. 6.1.3, 
we can replace S with an equivalent TM R with no storage in its control unit. This 
is the sought-for reduced TM that is equivalent to T. 


NB The reduced model of the Turing machine enables us to identify Turing ma- 
chines with their Turing programs. This can simplify the discussion. 


6.1.6 Use of Different Models 


Computations on different models of the Turing machine can differ considerably in 
terms of time (i.e., the number of steps) and space (i.e., the number of visited cells). 
But this becomes important only when we are interested in the computational com- 
plexity of problem solving. As concerns general computability, 1.e., computability 
on unrestricted models of computation, this is irrelevant. 

So the question is: Are different models of the Turing machine of any use in 
Computability Theory? The answer is yes. The generalized models are useful when 
we try to prove the existence of a TM for solving a given problem. Usually, the 
construction of such a TM is easier if we choose a more versatile model of the 
Turing machine. In contrast, if we must prove the nonexistence of a TM for solving 
a given problem, then it is usually better to choose a more primitive model (e.g., the 
basic or reduced model). 
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Fortunately, by referring to Computability Thesis, we can avoid the cumbersome 
and error-prone constructing of TMs (see Sect. 5.3.5). We will do this only whenever 
the existence of a TM will be important (and not a particular TM). 


6.2 Universal Turing Machine 


Recall that Gédel enumerated formulas of Formal Arithmetic A and, in this way, 
enabled formulas to express facts about other formulas and, eventually, about them- 
selves. As we have seen (Sect. 4.2.3), such a self-reference of formulas revealed 
the most important facts about formal axiomatic systems and their theories. Can a 
similar approach reveal important facts about Turing machines as well? The idea 
is this: If Turing machines were somehow enumerated (i.e., each TM described 
by a characteristic natural number, called the index), then each Turing machine T 
could compute with other Turing machines simply by including their indexes in T’s 
input word. Of course, certain questions would immediately arise: How does one 
enumerate Turing machines with natural numbers? What kind of computing with 
indexes makes sense? How does one use computing with indexes to discover new 
facts about Turing machines? In this section we will explain how, in 1936, Turing 
answered these questions. 


6.2.1 Coding and Enumeration of Turing Machines 


In order to enumerate Turing machines, we must define how Turing machines will 
be encoded, that is, represented by words over some coding alphabet. The idea is 
that we only encode Turing programs 6, but in such way that the other components 
Q,2,I,F, which determine the particular Turing machine, can be restored from the 
program’s code. An appropriate coding alphabet is {0,1}, because it is included in 
the input alphabet Y of every TM. In this way, every TM would also be able to read 
codes of other Turing machines and compute with them, as we suggested above. So, 
let us see how a Turing machine is encoded in the alphabet {0, 1}. 
Let T = (Q,2,IT',6,q1,4,F) be an arbitrary basic Turing machine. If 


5 (4i,2j) = (4k,Ze,Dm) 

is an instruction of its Turing program, we encode the instruction by the word 
K =0'10/10*10'10”, 

where D = Left, D2 = Right, and D3 = Stay. 


In this way, we encode each instruction of the program 6. Then, from the ob- 
tained codes K,,K2,...,K, we construct the code (T) of the TM T as follows: 
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(T) = 111K, 11Ky11...11K, 111. («) 


We can interpret (T) to be the binary code of some natural number. Let us call this 
number the index of the Turing machine T (i.e., its program). Note, however, that 
some natural numbers are not indexes, because their binary codes are not structured 
as (*). To avoid this, we make the following convention: Any natural number whose 
binary code is not of the form (*) is an index of a special Turing machine called the 
empty Turing machine. The program of this machine is everywhere undefined, 1.e., 
for each possible pair of state and tape symbol. Thus, for every input, the empty 
Turing machine immediately halts, in zero steps. 

Consequently, the following proposition holds. 


Proposition 6.1. Every natural number is the index of exactly one Turing machine. 


Given an arbitrary index (T), we can easily restore from it the components 
» I ,Q, and F of the corresponding basic TM T=(Q,2,I',6,q1,1,F); see Sect. 7.2. 
So, the above construction of (7) implicitly defines a total mapping g: N > 7, 
where 7 is the set of all basic Turing machines. Given an arbitrary n € N, g(n) 
can be viewed as the nth basic TM and be denoted by 7,,. By letting n run through 
0,1,2, ... we obtain the sequence 7p, 7), 7>,..., i.e., an enumeration of all basic TMs. 
Later, on p. 139, g will be called the enumeration function of the set T. Of course, 
corresponding to 7p,7),72,... is the enumeration 6p, 6, 62,... of Turing programs. 


Remark. We could base the coding and enumeration of TMs on the reduced model. In this 
case, instructions 6(qj,z;) = (qx,ze,Dm) would be encoded by somewhat shorter words K = 
0'10/10‘10°10", since j,¢ € {0, 1,2}, and no restoration of E and I from (T) would be needed. 
However, due to the simulation, the programs of the reduced TMs would contain more instructions 
than the programs of the equivalent basic models. Consequently, their codes would be even longer. 


Example 6.3. (TM Code) The code of the TM 7 in Example 6.1 (p. 115) is 
(T) = 11101001001000100 11 00100100100100 11 001010001001000 11010100010001000 111 


Ki Ky K3 K4 


and the corresponding index is 1075142408958020240455, a large number. 
The code (T’) of the TM 7’ in Example 6.2 (p. 116) is 


(T') = 1110100100100100 1101010010100 11 00100100100100 11001010010100 11 
e~_—_ See TN 


—~- 
K. 


Ky 2 K3 Ky 
0010001000100010 1 1 0001010000010001000 1 1 00010010000100010 1100001001000010010 11 


——————- 
Ks Ko Ky Kg 
9000101000001001000 111. 
$$ = 
Ko 


This code is different from (T) because T’ has a different Turing program. The corresponding in- 
dex is 13310162008 17824779885 1994059232878048 19703759431. 
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Obviously, the indexes of Turing machines are huge numbers and hence not very 
practical. Luckily, we will almost never need their actual values, and they will not 
be operands of any arithmetic operation. 


6.2.2 The Existence of a Universal Turing Machine 


In 1936, using the enumeration of his machines, Turing discovered a seminal fact 
about Turing machines. We state the discovery in the following proposition. 


Proposition 6.2. There is a Turing machine that can compute whatever is 
computable by any other Turing machine. 


Proof. The idea is to construct a Turing machine U that is capable of simulating any 
other TM T. To achieve this, we use the method of proving by CT (see Sect. 5.3.5): 
(a) we describe the concept of the machine U and informally describe the algorithm 
executed by its Turing program, and (b) we refer to CT to prove that U exists. 


(a) The concept of the machine U: 


(T) Ww ref fey ___ input tape 


Z\|z\z|z|z{z|ofofofola work tape 


Fig. 6.15 Universal TM: (ar) 
Lp 


auxiliary tape 
tapes and their contents ree re teP 


e Tapes of the machine U (see Fig. 6.15): 


1. The first is the input tape. This tape contains an input word consisting of 
two parts: the code (T) of an arbitrary TM T = (Q,2,I',6,q1,L4,F), and an 
arbitrary word w. 

2. The second is the work tape. Initially, it is empty. The machine U will use it 
in exactly the same way as T would use its own tape when given the input w. 

3. The third is the auxiliary tape. Initially, it is empty. The machine U will use it 
to record the current state in which the simulated T would be at that time, and 
for comparing this state with the final states of T. 
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e The Turing program of U should execute the following informal algorithm: 


1. Check whether the input word is (7,w), where (7) is a code of some TM. 
If it is not, halt. 

2. From (T) restore the set F and write the code (q1, F) to the auxiliary tape. 

3. Copy w to the work tape and shift the window to the beginning of w. 

4. // Let the aux. tape have (q;,F) and the work tape window scan a symbol z;. 


If qi € F, halt. 1 T would halt in a final state. 
5. On the input tape, search in (T) for the instruction beginning with “6 (q;,z,-)=” 
6. If not found, halt. //T would halt in a non-final state. 


7. // Suppose that the found instruction is 6(qi,Z-) = (qj,Zw,D). 

On the work tape, write the symbol z,, and move the window in direction D. 
8. On the auxiliary tape, replace (qi, F) by (q;,F). 
9. Continue with step 4. 


(b) The above algorithm can be executed by a human. So, according to the Com- 
putability Thesis, there is a Turing machine U = (Quy ,2u,Iv, 6u,qiu,-4, Fy) whose 
program dy executes this algorithm. We call U the Universal Turing Machine. 


Small Universal Turing Machines 


The universal Turing machine U was actually described in detail. It was to be expec- 
ted that (U) would be a huge sequence of Os and 1s. Indeed, for example, the code 
of U constructed by Penrose” and Deutsch? in 1989 had about 5,500 bits. 

Shannon‘ was aware of this when in 1956 he posed the problem of the construc- 
tion of the simplest universal Turing machine U. He was interested in the simplest 
two-way unbounded model of such a machine. Thus, U was to be deterministic 
with no storage in its control unit, and have a single two-way infinite tape with one 
track. To measure the complexity of U Shannon proposed the product |Qy]| - |Tv. 
The product is an upper bound on the number of instructions in the program dy 
(see Fig. 6.3 on p. 113). Alternatively, the complexity of U could be measured more 
realistically by the number of actual instructions in dy. 

Soon it became clear that there is a trade-off between |Qy| and |Iy|: the number 
of states can be decreased if the number of tape symbols is increased, and vice versa. 
So the researchers focused on different classes of universal Turing machines. Such 
a class is denoted by UTM(s,t), for some s,t > 2, and by definition contains all the 
universal Turing machines with s states and t tape symbols (of the above model). 

In 1996, Rogozhin? found universal Turing machines in the classes UTM(2,18), 
UTM(3,10), UTM(4,6), UTM(5,5), UTM(7,4), UTM(10,3), and UTM(24,2). Of 
these, the machine U € UTM(4,6) has the smallest number of instructions: 22. 


? Roger Penrose, b. 1931, British physicist, mathematician and philosopher. 

3 David Elieser Deutsch, b. 1953, British physicist. 

4 Claude Elwood Shannon, 1916-2001, American mathematician and electronics engineer. 
> Yurii Rogozhin, b. 1949, Moldavian mathematician and computer scientist. 
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6.2.3 The Importance of the Universal Turing Machine 


We can now upgrade Fig. 5.13 (p. 99) to Fig. 6.16. The existence of the univer- 
sal TM indicated that it might be possible to design a general-purpose computing 
machine—something that is today called the general-purpose computer. 


Fig. 6.16 The universal Tur- Turing en am | + | Turing comm n| 
ing machine can execute any 


Turing program and thus ; 7 
compute the solution of any Gc ® 
problem tideadedia 


6.2.4 Practical Consequences: Data vs. Instructions 


Notice that the machine U uses both the program of T and the input data to T as 
its own input data, i.e., as two input words (T) and w written in the same alphabet 
xy. U interprets the word (T) as a sequence of instructions to be simulated (i.e., 
executed), and the word w as input to the simulated T. This consequence is one of 
the important discoveries of Turing. 


Consequence 6.1. (Data vs. Instructions) There is no a priori difference be- 
tween data and instructions; the distinction between the two is established by 
their interpretation. 


6.2.5 Practical Consequences: General-Purpose Computer 


Turing’s proof of the existence of a universal Turing machine was a theoretical proof 
that a general-purpose computing machine is possible. This answers the question 
raised by Babbage a century earlier (see p. 6). Thus, the following practical conse- 
quence of Turing’s discovery was evident. 


Consequence 6.2. (General-Purpose Computer) It is possible to construct a 
physical computing machine that can compute whatever is computable by any 
other physical computing machine. 
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The construction of a general-purpose computing machine started at the be- 
ginning of the 1940s. After initial unsuccessful trials, which were mainly due to 
the teething troubles of electronics, researchers developed the first, increasingly 
efficient general-purpose computing machines, now called computers. These in- 
cluded ENIAC, EDVAC, and IAS, which were developed by research teams led 
by Mauchly,°® Eckert,’ von Neumann, and others. By the mid-1950s, a dozen other 
computers had emerged. 


Von Neumann’s Architecture 


Interestingly, much of the development of early computers did not closely follow 
the structure of the universal Turing machine. The reasons for this were both the 
desire for the efficiency of the computing machine and the technological conditions 
of the time. Abstracting the essential differences between these computers and the 
universal TM, and describing the differences in terms of Turing machines, we find 
the following: 


Cells are now enumerated. 
The control unit does not access cells by a time-consuming movement of the 
window. Indeed, there is no window. Instead, the control unit directly accesses 
an arbitrary cell in constant time by using an additional component. 

e The program is no longer in the control unit. Instead, the program is written in 
cells of the tape (as is the input data to the program). 

e The control unit still executes the program in a stepwise fashion, but the control 
unit has different duties. Specifically, in each step, it typically does the following: 


. reads an instruction from a cell; 

. reads operands from cells; 

. executes the operation on the operands; 
. writes the result to a cell. 


BRWN eR 


To do this, the control unit uses additional components: the program counter, 
which describes from which cell the next instruction of the program will be read, 
registers, which store operands, and a special register, called the accumulator, 
where the result of the operation is left. 


Of course, due to these differences, terminological differences also arose. For 
example, main memory (* tape), program (& Turing program), processor (* con- 
trol unit), memory location (* cell), and memory address (* cell number). The 
general structure of these computers was called the von Neumann architecture (af- 
ter an influential report on the logical design of the EDVAC computer, in which 
von Neumann described the key findings of its design team). 


6 John William Mauchly, 1907-1980, American physicist. 
7 John Adam Presper Eckert, Jr., 1919-1995, American engineer and computer scientist. 
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6.2.6 Practical Consequences: Operating System 


What takes care of loading a program P, which is to be executed by a computer, into 
the memory? This is the responsibility of the operating system (OS). The operating 
system is a special program that is resident in memory. When it executes, it takes 
care of everything needed to execute any other program P. In particular: 


1. It reads the program P and its input data from the computer’s environment. 

2. It loads P into the memory. 

3. It sets apart additional memory space, which P will be using during its execution. 
This space contains the data region with P’s input data and other global variables; 
the runtime stack for local variables of procedures and procedure-linkage infor- 
mation; and the heap for dynamic allocation of space when explicitly demanded 
by P. 

4. It initiates P’s execution by transferring control to it, i.e., by writing to the pro- 
gram counter the address of P’s first instruction. 

5. When P halts, it takes over and gives a chance to the next waiting program. 


In time, additional goals and tasks were imposed on operating systems, mainly 
because of the desire to improve the efficiency of the computer and its user- 
friendliness. Such tasks include multiprogramming (i.e., supporting the concur- 
rent execution of several programs), memory management (i.e., simulating a larger 
memory than available in reality), file system (i.e., supporting permanent data stor- 
age), input/output management (i.e., supporting communication with the environ- 
ment), protection (i.e., protecting programs from other programs), security (i.e., 
protecting programs from the environment), and networking (i.e., supporting com- 
munication with other computers). 


But there remains the question of what loads the very OS into the memory and 
starts it. For this, hardware is responsible. There is a small program, called the 
bootstrap loader, embedded in the hardware. This program 


1. starts automatically when the computer is turned on, 
2. loads the OS to the memory, and 
3. transfers the control to the OS. 


In terms of Turing machines, the bootstrap loader is the Turing program in the con- 
trol unit of the universal Turing machine U. Its role, however, is to read the simulator 
(i.e., OS) into the control unit, thus turning U into a true universal TM. (The reading 
of finitely many data and hence the whole simulator into the control unit of a Turing 
machine is possible, as we described in Sect. 6.1.3.) 


We conclude that a modern general-purpose computer can be viewed as a univer- 
sal Turing machine. 
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6.2.7 Practical Consequences: RAM Model of Computation 


After the first general-purpose computers emerged, researchers tried to abstract the 
essential properties of these machines in a suitable model of computation. They 
suggested several models of computation, of which we mention the register ma- 
chine and its variants RAM and RASP.® All of these were proved to be equivalent 
to the Turing machine: What can be computed on the Turing machine can also be 
computed on any of the new models, and vice versa. In addition, the new models dis- 
played a property that was becoming increasingly important as solutions to more and 
more problems were attempted using the computers. Namely, these models proved 
to be more suitable for and realistic in estimating the computational resources (e.g., 
time and space) needed for computations on computers. This is particularly true of 
the RAM model. Analysis of the computational complexity of problems, i.e., an es- 
timation of the amount of computing resources needed to solve a problem, initiated 
Computational Complexity Theory, a new area of Computability Theory. So, let us 
describe the RAM model of computation. 

While the Turing machine has sequential access to data on its tape, RAM ac- 
cesses data randomly. There are several variants of RAM but they differ only in the 
level of detail in which they reflect the von Neumann architecture. We present a 


more detailed one in Fig. 6.17. 
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Definition 6.3. (RAM) The random access machine (RAM) model of com- 
putation has several components: the processor with registers, two of which 
are the program counter and the accumulator; the main memory; the input and 
output memory; and a program with input data. Also the following holds: 


8 The register machine and its variants RAM (random access machine) and RASP (random access 
stored program) were gradually defined by Wang (1954), Melzak and Minsky (1961), Shepherdson 
and Sturgis (1963), Elgot and Robinson (1964), Hartmanis (1971), and Cook and Rechow (1973). 


6.2 Universal Turing Machine 


1. a. Input and Output Memory: During the computation, input data is read 
sequentially from the input memory, and the results are written to the 
output memory. Each memory is a sequence of equally sized locations. 
The location size is arbitrary, but finite. A location is empty or contains 


an integer. 
b. Main Memory: This is a potentially infinite sequence of equally sized 
locations mo,mj,.... The index i is called the address of m;. Each 


location is directly accessible by the processor: Given an arbitrary i, 
reading from m, or writing to m; is accomplished in constant time. 

c. Registers: This is a sequence of locations 71,r2,...,/,, m > 2, in the 
processor. Registers are directly accessible. Two of them have special 
roles. Program counter pc(= 11) contains the address of the location 
in the main memory that contains the instruction to be executed next. 
Accumulator a(= rz) is involved in the execution of each instruction. 
Other 7; are given roles as needed. 

d. Program: The program is a finite sequence of instructions. The details 
of the instructions are not very important as long as the RAM is of 
limited capability (see Sect. 5.2.4). So, it is assumed that the instruc- 
tions are similar to the instructions of real computers. Thus, there are 
arithmetical, logic, input/output, and (un)conditional jump instructions. 
If n =2, each instruction contains the information op about the 
operation and, depending on op, the information i about the operand. 
There may be additional modes of addressing, e.g., indirect (denoted 
by *z) and immediate (denoted by =7). Examples of instructions are: 


read (read data from input memory to accumulator a) 

loadi (a:=mj) load+*i (a:=mm,) 

addi (a:=a+mj) add=i (a:=a+i) 

jmp i (pe := i) 

120 (if a = 0 then pc := i) jgzi Gf a > 0 then pe := i) 


storei (m;:=a) 
write (write data from accumulator a to output memory) 
halt (halt) 

2. Before the RAM is started, the following is done: (a) a program is loaded 
into the main memory (into successive locations starting with mo); (b) 
input data are written to the input memory; (c) the output memory and 
registers are cleared. 

3. From this point on, the RAM operates independently in a mechanical step- 
wise fashion as instructed by its program. Let pc = k at the beginning of 
a step. (Initially, k = 0.) From the location mx, the instruction I is read 
and started. At the same time pc is incremented. So, when I is completed, 
the next instruction to be executed is in m;,, unless one of the following 
holds: a) I was jmpi; b) I was jzi or jgzi and pc was assigned /; c) 
I was halt; d) I changed pc so that it contains an address outside the 
program. In c) and d) the program halts. 
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We now state the following important proposition. 


Proposition 6.3. (RAM vs. TM) The RAM and the Turing machine are equivalent: 
What can be computed on one of them can be computed on the other. 


Box 6.1 (Proof of Proposition 6.3). We show how TM and RAM simulate each other. 


Simulation of a TM with a RAM. Let T =(Q,2,T',6,q1,L4,F) be an arbitrary TM. The RAM 
will have its main memory divided into three parts. The first part, consisting of the first p locations 
m,...,Mpy—1, Will contain the RAM’s program. The second part will contain the Turing program 
6. The third part will be the rest of the main memory; during the simulation, it will contain the 
same data as T would have on its tape. Two of the RAM’s registers will play special roles: 3 will 
reflect the current state of 7, and rq will contain the address of the location in the RAM’s main 
memory corresponding to the cell under T’s window. 

The RAM is initialized as follows. Let us view 6 as a table A = Q x I, where the compo- 
nent A[q,z] = (q',z',D) if 5(g,z) = (q’,z,D) is an instruction of 6, and A[{q,z] = 0 if 5(q,z)T. 
Since there are d = |Q|-|I"| components in A, we can bijectively map them to the d locations 
Mp,-+-,Mp4a—1. SO, we choose a bijection ¢: A — {p,...,p+d-—1} and write each A{q,z] into 
the location myiq,-). (A possible ¢ would map row after row of A into the memory.) The third part 
of the RAM’s memory is cleared and T’s input word is written to the beginning of it, i.e., into the 
locations mp+a,Mp+a+1,-... The registers r3 and ry are set to g; and p +d, respectively. 

Now, the simulation of T starts. Each step of T is simulated as follows. Based on the values 
q = (r3) and z= mv,,), the RAM reads the value of 6(q,z) from myrq,,). If this value is 0, the RAM 
halts because 5(q,z)t. Otherwise, the value read is (q',z’,D), so the RAM must simulate the in- 
struction 6(q,z) = (q’,z’,D). This it can do in three steps: 1) r3 := q’; 2) mi,,) := 2; 3) depending 
on D, it decrements r4 (D = Left), increments r4 (D = Right), or leaves r4 unchanged (D = Stay). 
Remark. Note that the RAM could simulate any other TM. To do this, it would only change 6 in 
the second part of the main memory and, of course, adapt the value of d. 


Simulation of a RAM with a TM. Let R be an arbitrary RAM. R will be simulated by the following 
multi-tape TM T. For each register r; of R there is a tape, called the 7;-tape, whose contents will be 
the same as the contents of 7;. Initially, it contains 0. There is also a tape, called the m-tape, whose 
contents will be the same as the contents of R’s main memory. If, for i= 0,1,2,..., the location mj; 
would contain c;, then the m-tape will have written | 0:co | 1:c, | 2:c |... i:c; |... (up to the last 
nonempty word). 

T operates as follows. It reads from the pc-tape the value of the program counter, say k, and 
increments this value on the pc-tape. Then it searches the m-tape for the subsequence | k: . If the 
subsequence is found, T reads cx, i.e., R’s instruction I, and extracts from I both the information 
op about the operation and the information i about the operand. What follows depends on the 
addressing mode: (a) If I is op =i (immediate addressing), then the operand is i. (b) If I is opi 
(direct addressing), then T searches the m-tape for the subsequence | i: . If the subsequence is 
found, then the operand is c;. (c) If I is op »i (indirect addressing), then, after T has found cj, it 
searches the m-tape for the subsequence | c;: . If the subsequence is found then the operand is ¢¢,. 
When the operand if known, T executes op using the operand and the contents of the a-tape. 


NB _ General-purpose computers are capable of computing exactly what Turing ma- 
chines can—assuming there are no limitations on the time and space consumed by 
the computations. Because of this, we will continue to investigate computability by 
using the Turing machine as the model of computation. The conclusions that we will 
come to will also hold for modern general-purpose computers. 
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6.3 Use of a Turing Machine 


In this section we will describe three elementary tasks for which we can use a Turing 
machine: 1) to compute the values of a function; 2) to generate elements of a set; 
and 3) to find out whether objects are members of a set. 


6.3.1 Function Computation 


A Turing machine is implicitly associated, for each natural k > 1, with a k-ary func- 
tion mapping k words into one word. Since the function is induced by the Turing 
machine, we will call it the k-ary proper function of the Turing machine. Here are 
the details. 


Definition 6.4. (Proper Function) Let T = (Q,2,I,6,q1,4,F) be an arbitrary 
Turing machine and k > | a natural number. The k-ary proper function of T 


is a partial function ol :(Z*)* + E*, defined as follows: 


If the input word to T consists of words u1,...,u, € &*, then 


v, if T halts in any state and the tape contains 
def only the word v € 2*; 
Pr. (m1,--- 54k) = +, else, i.e., T doesn’t halt or T halts but the tape 
doesn’t contain a word in L*. 
If e is the index of T, we also denote the k-ary proper function of T by oh. 
When k is known from the context or it is not important, we write @r or Qe. 
The domain of @, is also denoted by WW, i.e., We = dom(@,) = {x| @.(x) L}. 


So, given k words u1,...,uU, written in the input alphabet of the Turing machine, 
we write the words to the tape of the machine, start it and wait until the machine 
halts and leaves a single word on the tape written in the same alphabet. If this does 
happen, and the resulting word is denoted by v, then we say that the machine has 
computed the value v of its k-ary proper function for the arguments u1,..., Ux. 

The interpretation of the words u1,...,ug and v is left to us. For example, we can 
view the words as the encodings of natural numbers. In particular, we may use the 
alphabet X = {0,1} to encode n € N by 1”, and use 0 as a delimiter between the 
different encodings on the tape. For instance, the word 11101011001111 represents 
the numbers 3, 1,2,0,4. (The number 0 was represented by the empty word € = 1°.) 
In this case, ol is ak-ary numerical function with values represented by the words 
1...1. When the function value is 0, the tape is empty (i.e., contains 1° = €), An- 
other encoding will be given in Sect. 6.3.6. 
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Usually we face the opposite task: Given a function @ : (Z*)‘ — L*, find a TM 
capable of computing @’s values. Thus, we must find a TM T such that ol ~@. 


Depending on how powerful, if at all, such a T can be, i.e., depending on the ex- 
tent to which @ can possibly be computed, we distinguish between three kinds of 
functions @ (in accordance with the formalization on p. 104). 


Definition 6.5. Let g : (Z*)* — E* be a function. We say that 


@iscomputable if there is a TM that can compute @ 
anywhere on dom(@) 
and dom(@) = (=*)*; 
@ is partial computable if there is a TM that can compute @ 
anywhere on dom(@); 
@ isincomputable if there is no TM that can compute @ 
anywhere on dom(@). 


ATM that can compute @ anywhere on dom(@) is also called the computer of @. 


Example 6.4. (Addition in Different Ways) The two Turing machines in Examples 6.1 (p. 115) 
and 6.2 (p. 116) are computers of the function sum(n;,n2) =n; +:n2, where n1,n2 > 0. 


A slight generalization defines (in accordance with Definition 5.3 on p. 105) what 
it means when we say that such a function is computable on a set S C (Z*)*. 


Definition 6.6. Let g : (Z*)* — E* be a function and S C (*)*. We say that 


@ is computable on S if there isa TM that can compute @ 
anywhere on S; 
@ is partial computable on S if there is a TM that can compute ~ 
anywhere on SMdom(@); 
@ isincomputable on S if there isno TM that can compute @ 
anywhere on SMdom(@). 
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6.3.2 Set Generation 


When can elements of a set be enumerated? That is, when can elements of a set 
be listed in a finite or infinite sequence so that each and every element of the set 
sooner or later appears in the sequence? Moreover, when can the sequence be gen- 
erated by an algorithm? These questions started the quest for the formalization of 
set generation. The answers were provided by Post, Church, and Turing. 


Post’s Discoveries 


Soon after Hilbert’s program appeared in the 1920s, Post started investigating the 
decidability of formal theories. In 1921, he proved that the Propositional Calculus P 
is a decidable theory by showing that there is a decision procedure (i.e., algorithm) 
that, for an arbitrary proposition of P, decides whether or not the proposition is prov- 
able in P. The algorithm uses truth-tables. Post then started investigating the decid- 
ability of the formal theory developed in Principia Mathematica (see Sect. 2.2.3). 
Confronted with sets of propositions, he realized that there is a strong similarity 
between the process of proving propositions in a formal theory and the process of 
“mechanical, algorithmic generating” of the elements of a set. Indeed, proving a 
proposition is, in effect, the same as “generating” a new element of the set of all 
theorems. This led Post to the questions, 


What does it mean to “algorithmically generate” elements of a set? 
Can every countable set be algorithmically generated? 


Consequently, he started searching for a model that would formally define the intu- 
itive concept of the “algorithmic generation of a set” (i.e., an algorithmic listing of 
all the elements of a set). Such a model is called the generator of a set. 

In 1920-1921, Post developed canonical systems and normal systems and pro- 
posed these as generators. Informally, a canonical system consists of a symbol S, 
an alphabet 2, and a finite set P of transformation rules, called productions. Start- 
ing with the symbol S, the system gradually transforms S through a sequence of 
intermediate words into a word over 2. We say that the word has been generated 
by the canonical system. In each step of the generation, the last intermediate word 
is transformed by a production applicable to it. Since there may be several appli- 
cable productions, one must be selected. As different selections generally lead to 
different generated words, the canonical system can generate a set of words over 
». Post also showed that productions can be simplified while retaining the generat- 
ing power of the system. He called canonical systems with simplified productions 
normal systems. The set generated by a normal system he called a normal set.° 


° Post described his ideas in a less abstract and more readable way than was the usual prac- 
tice at the time. The reader is advised to read his influential and informative paper from 1944 
(see References). Very soon, Post’s user-friendly style was adopted by other researchers. This 
speeded up the exchange of ideas and results and, hence, the development of Computability Theory. 
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Box 6.2 (Normal Systems). 


A canonical system is a quintuple (V,2,,P,S), where V isa finite set of symbols and XY C VY. The 
symbols in 2 are said to be final, and the symbols in V — X are non-final. There is a distinguished 
non-final symbol, S, called the start symbol. ¥ is a finite set of variables. A variable can be 
assigned a value that is an element of V*, i.e., any finite sequence of final and non-final symbols. 
The remaining component of the quintuple is P. This is a finite set of transformation rules 
called productions. A production describes the conditions under which, and the manner in which, 
subwords of a word can be used to build a new word. The general form of a production is 


AX 4X2... Ay—1XnAn > Boxi, Bi xi tee Bm—1Xim Bm, 


where a, 8; € V* and x, € 4’. When and how can a production p be applied to a word v € V*? 
If each variable x; in the left-hand side of p can be assigned a subword of v so that the left-hand 
side of p becomes equal to v, then v can be transformed into a word v’, which is obtained from 
the right-hand side of p by substituting variables x, with their assigned values. Note that p may 
radically change the word v: Some subwords of v may disappear; new subwords may appear in v’; 
and all these constituents may be arbitrarily permuted in v’. 

We write S +p v to denote that the start symbol S can be transformed into v by a finite number 
of applications of productions of P. We say that S generates the word w if S +p w and we E*. 
The set of words generated by P is denoted by G(P), that is, G(P) = {w € Z*|S Sp w}. 

Post proved that the productions of any canonical system can be substituted by productions of 
the form 0jx, — x;,B;. Canonical systems with such productions are said to be normal. 


Post proved that the formal theory developed in Principia Mathematica can be 
represented as a normal system. Consequently, the set of theorems of Principia 
Mathematica is a normal set. Encouraged by this result, in 1921 he proposed the 
following formalization of the intuitive notion of set “generation”’: 


Post Thesis. A set S can be “generated” «—+ S is normal 


Post did not prove this proposition. He was not aware that it cannot be proved. 
The reasons are the same as those that, 15 years later, prevented the proving of the 
Computability Thesis (see Sect. 5.3). Namely, the proposition is a variant of the 
Computability Thesis. 

In order to move on, he used the proposition as a working hypothesis (now called 
the Post Thesis), i.e., something that he will eventually prove. The thesis enabled him 
to progress in his research. Indeed, he made several important findings that were, 15 
years later, independently, and in different ways discovered by Gédel, Church and 
Turing. (In essence, these are the existence of undecidable sets, which we will come 
to in the following sections, and the existence of the universal Turing machine.) 
Unfortunately, Post did not publish his results because he felt sure that he should 
prove his thesis first. As a byproduct of his attempts to do that, in 1936 he proposed 
a model of computation that is now called the Post machine (see Sect. 5.2.3). 
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Church’s Approach 


In 1936, Church also became interested in the questions of set “generation”. While 
investigating sets whose elements are values of computable functions, he noticed: 
If a function g is computable on N, then one can successively compute the values 
g(0),g(1),g(2),... and hence generate the set {g(i) |i € N}. 

But, the opposite task is more interesting: Given a set S, find a computable func- 
tion g: NS so that {g(i) |ic N} =S. If g exists, then S can be listed, i.e., all the 
elements of S are g(0),g(1),g(2),..., and enumerated, i.e., an element x € S is said 
to be nth in order if n is the smallest i € N for which g(i) =x. Such a g is said to be 
an enumeration function of the set S. (For example, the mapping g on p. 126 is an 
enumeration function of the set of all Turing programs.) 

These ideas were also applied by Kleene. He imagined a function g that maps 
natural numbers into systems of equations é’, as defined by Herbrand and Gédel 
(see p. 84). If g is computable, then by computing g(0),9(1),g(2),... one gener- 
ates systems &,¢1,62,... , each of which defines a computable function. Kleene 
then proved that there is no (total) computable function g on N such that g would 
generate all systems of equations that define computable functions. The reader 
may (correctly) suspect that Kleene’s proof is connected with the diagonalizaton 
in Sect. 5.3.3. 


Naturally, Post was interested in seeing how his normal systems compared with 
Church’s generator (i.e., (total) computable functions). He readily proved the fol- 
lowing theorem. (We omit the proof.) 


Theorem 6.1 (Normal Set). A set S is normal => S =@ V S is the range of a 
computable function on N. 


TM as a Generator 

In addition to Post’s normal systems and Church’s computable functions, also Tur- 
ing machines can be generators. A Turing machine that generates a set S will be 
denoted by Gg (see Fig. 6.18). The machine Gg writes to its tape, in succession, the 


elements of S and nothing else. The elements are delimited by the appropriate tape 
symbol in I’ —Z (e.g., #). 


¢ 
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It turned out that the three generators are equivalent in their generating power. 
That is, if a set can be generated by one of them, it can be generated by any other. 
Because the Turing machine most convincingly formalized the basic notions of com- 
putation, we restate the Post Thesis in terms of this model of computation. 


Post Thesis. (TM) A set S can be “generated” «—> S can be generated by a TM 


Due to the power of the Turing machine, the intuitive concept of set “generation” 
was finally formalized. Therefore, we will no longer use quotation marks. 

Sets that can be algorithmically generated were called normal by Post. Today, we 
call them computably enumerable sets. Here is the official definition. 


Definition 6.7. (c.e. Set) A set S is computably enumerable (for short c.e.)!° if 
S can be generated by a TM. 


From Theorem 6.1 and the above discussion we immediately deduce the follow- 
ing important corollary. 


Corollary 6.1. A set S is c.e. = > S=0 V S is the range of a computable 
function on N. 


Remarks. 1) Note that the order in which the elements of a c.e. set are generated is not prescribed. 
So, any order will do. 2) An element may be generated more than once; what matters is that it be 
generated at least once. 3) Each element of a c.e. set is generated in finite time (i.e., a finite number 
of steps of a TM). This, however, does not imply that the whole set can be generated in finite time, 
as the set may be infinite. 


6.3.3 Set Recognition 


Let T be a Turing machine and w an arbitrary word. Let us write w as the input 
word to T’s tape and start T. There are three possible outcomes. If T after reading w 
eventually halts in a final state qyes, then we say that T accepts w. If T after reading 
w eventually halts in a non-final state gyno, we say that T rejects w. If, however, T 
after reading w never halts, then we say that T does not recognize w. Thus, a Turing 
machine T is implicitly associated with the set of all the words that T accepts. Since 
the set is induced by 7, we will call it the proper set of T. Here is the definition. 


'0 The older name is recursively enumerable (r.e.) set. See more about the renaming on p. 153. 
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Definition 6.8. (Proper Set) Let T = (Q,2,I,6,q1,.4,F) be a Turing machine. 


def 


The proper set!! of T is the set L(T) = {w € £*|T accepts w}. 


Usually we are confronted with the opposite task: Given a set S, find a Turing 
machine T such that L(T) = S. Put another way, we must find a Turing machine that 
accepts exactly the given set. Such a T is called an acceptor of the set S. However, 
we will see that the existence of an acceptor of S is closely connected with S’s 
amenability to set recognition. So let us focus on this notion. 


TM as a Recognizer of a Set 


Informally, to completely recognize a set in a given environment, also called the 
universe, is to determine which elements of the universe are members of the set and 
which are not. Finding this out separately for each and every element of the universe 
is impractical because the universe may be too large. Instead, it suffices to exhibit 
an algorithm capable of deciding this for an arbitrary element of the universe. 

What can be said about such an algorithm? Let us be given the universe U/, an 
arbitrary set S CU, and an arbitrary element x € U/. We ask: “Is x a member of the 
set S?”’, or, for short, x €?S. The answer is either YES or NO, as there is no third 
possibility besides x € S andx ZS. 

However, the answer may not be obvious (Fig. 6.19). 


Fig. 6.19 Is x in S or in 
S =U —S? The border 
between S and S is not clear 


So, let us focus on the construction of an algorithm A that will be capable of an- 
swering the question x €?S. First, recall the definition of the characteristic function 
of a set. 


Definition 6.9. The characteristic function of a set S, where S CU, is a function 
Xs :U-— {0,1} defined by 


wt { 1(= YES), ifxe S; 
O(=NO), ifx¢S. 


'l Also called the /anguage of the Turing machine T. 
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By definition, 7s is total on U, that is, ys(x) is defined for every x € U. If 
the sought-for algorithm A could compute 7s5’s values, then it would answer the 
question x €?S simply by computing the value 7s5(x). In this way the task of set 
recognition would be reduced to the task of function computation. 

But how would A compute the value 75(x)? The general definition of the char- 
acteristic function 75 reveals nothing about how to compute 75 and, consequently, 
how to construct A. What is more, the definition reveals nothing about the com- 
putability of the function 7s. So, until / and S are defined in greater detail, nothing 
particular can be said about the design of A and the computability of 7s. 

Nevertheless, we can distinguish between three kinds of sets S, based on the 
extent to which the values of 75 can possibly be computed on U/ (and, consequently, 
how S can possibly be recognized in Z/). 


Definition 6.10. Let 7// be the universe and S CU be an arbitrary set. We say 
that the set 


S is decidable (or computable!) in U/ if ys is a computable function on U/; 
S is semi-decidable in U/ if 7s is a computable function on S; 
S is undecidable (or incomputable) in U/ if 75 is an incomputable function on U/. 


(Remember that 75 :U/ — {0, 1} is total.) 


This, in turn, tells us how powerful the algorithm A can be: 


e When a set S is decidable in U/, there exists an algorithm (Turing program) A 
capable of deciding, for an arbitrary element x € U, whether or not x € S. We 
call such an algorithm a decider of the set S inU/ and denote it by Ds. A decider 
Dg makes it possible to completely recognize S in U, that is, to determine what 
is in S and what is inS =U —S. 

e When a set S is semi-decidable in //, there is an algorithm A capable of determin- 
ing, for an arbitrary x € S, that x is a member of S. If, however, in truth x € S, 
then A may or may not find this out (because it may not halt). We call such an 
algorithm the recognizer of the set S in U/ and denote it by Rs. The recognizer 
Rs makes it possible to completely determine what is in S, but it may or may not 
be able to completely determine what is in S. 

e When a set S is undecidable in U/, there is no algorithm A that is capable of 
deciding, for arbitrary element x € U, whether or not x € S. In other words, any 
candidate algorithm A fails to decide, for at least one element x € U, whether or 
not x € S (because on such an input A gives an incorrect answer or never halts). 
This can happen for an x that is in truth a member of S or S. In addition, there 
can be several such elements. Hence, we cannot completely determine what is in 
S, or in S, or in both. 


'2 The older name is recursive set. We describe the reasons for a new name on p. 153. 
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A word of caution. Observe that a decidable set is by definition also semi- 
decidable. Hence, a decider of a set is also a recognizer of the set. The inverse 
does not hold in general; we will prove later that there are semi-decidable sets that 
are not decidable. 


Remarks. The adjectives decidable and computable will be used as synonyms and depend on 
the context. For example, we will say “decidable set’ in order to stress that the set is completely 
recognizable in its universe. But we will say “computable set” in order to stress that the charac- 
teristic function of the set is computable on the universe. The reader should always bear in mind 
the alternative adjective and its connotation. We will use the adjectives undecidable and incom- 
putable similarly. Interestingly, the adjective semi-decidable will find a synonym in the adjective 
computably enumerable (c.e.). In the following we explain why this is so. 


6.3.4 Generation vs. Recognition 


In this section we show that set generation and set recognition are closely con- 
nected tasks. We do this in two steps: First, we prove that every c.e. set is also 
semi-decidable; then we prove that every semi-decidable set is also c.e. So, let U/ be 
the universe and S CU be an arbitrary set. 


next 
generated 


Fig. 6.20 Rs checks whether 


Gg generated x YES, xisin S 


Suppose that the set S is c.e. We can use the generator Gg to answer the question 
“Is x a member of S?” for arbitrary x € U/. To do this, we enhance Gg so that the 
resulting algorithm A checks each generated word to see whether or not it is x (see 
Fig. 6.20). If and when this happens, the algorithm A outputs the answer YES (.e., 
x€ S) and halts; otherwise, the algorithm A continues generating and checking the 
members of S. (Clearly, when in truth x ¢S and S is infinite, A never halts.) It is 
obvious that A is Rs, a recognizer of the set S. Thus, S is semi-decidable. 

We have proven the following theorem. 


Theorem 6.2. A set S is c.e. = > S is semi-decidable in U. 
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What about the other way around? Is a semi-decidable set also c.e.? In other 
words, given a recognizer Rg of S, can we construct a generator Gs of S? The 
answer is yes, but the proof is trickier. 


generate 
next pair 


YES,NO Y1 


a) b) 


Fig. 6.21 a) A naive construction of Gg that works only for decidable sets S. b) An improved 
construction of Gg that works for any semi-decidable set S 


Suppose that the set S is semi-decidable. The naive construction of Gg is as follows 
(see Fig. 6.21a). Assuming that U/ is c.e., Gs uses 1) a generator Gy to generate 
elements of U/ and 2) a recognizer Rg to find out, for each generated x € U/, whether 
or not x € S. If Rg answers YES, then Gg outputs (generates) x, and if the answer is 
NO, then Gg outputs nothing. However, there is a pitfall in this approach: If in truth 
x¢S, then the answer NO is not guaranteed (because S is semi-decidable). So Gs 
may wait indefinitely long for it. In that case, the operation of Gg drags on for an 
infinitely long time, although there are still elements of U/ that should be generated 
and checked for membership in S. 

This trap can be avoided by a technique called dovetailing (see Fig. 6.21b). The 
idea is to ensure that Gs waits for Rs’s answer for only a finitely long time. To 
achieve this, Gs must allot to Rs a finite number of steps, say j, to answer the ques- 
tion x €?S. If Rs answers YES (1.e., x € S) in exactly j steps, then Gs generates (out- 
puts) x. Otherwise, Gs asks Rs to recognize, in the same controlled fashion, some 
other candidate element of U/. Of course, later Gs must start again the recognition 
of x, but this time allotting to Rs a larger number of steps. 

To implement this idea, Gs must systematically label and keep track of each 
candidate element of / (e.g., with the natural number /) as well as of the currently 
allotted number of steps (e.g., with the natural number /). For this reason, Gs uses 
a generator Gyo, which systematically generates pairs (i,j) ¢ N > in such a way that 
each pair is generated exactly once. The details are given in the proof of the next 
theorem (see Box 6.3). 
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Theorem 6.3. Let the universe U be c.e. Then: 
A set S is semi-decidable inU => S is c.e. 


Box 6.3 (Proof). 


Recall that a c.e. set is the range of a computable function on N (Corollary 6.1, p. 140). Hence, there 
is a computable function f: NU such that, for every x EU, there is an i€ N for which x= f(i). 
Thus, the set 2/ can be generated by successive computing of the values f(0), f(1), f(2),... After 
f (i) is computed, it is fed into the recognizer Rg for a controlled recognition, i.e., a finite number j 
of steps are allotted to Rs in which it tries to decide x €?S. (This prevents Rg from getting trapped 
in an endless computation of 7s (x).) If the decision is not made in the jth step, it might be made 
in the following step. So Rg tries again later to recognize x, this time with j+ 1 allotted steps. 

Obviously, we need a generator capable of generating all the pairs (i, j) of natural numbers in 
such a way that each pair is generated exactly once. This can be done by a generator that generates 
pairs in the order of visiting dots (representing pairs) described in Fig 6.22. 


Fig. 6.22 The order of gen- 
erating pairs (i, j) ¢ N*. Each 
pair is generated exactly once 


The initial generated pairs are (0,0), (0,1), (1,0), (0,2), (1,1), (2,0), (0,3), (1,2), (2, 1), (3,0), ... 

The generator must output pairs (i, 7) € N* so that, for each k = 0,1,2,3,..., it systematically 
generates all the pairs having i+ j = k. These pairs correspond to the dots on the same diagonal 
in the table. It is not hard to conceive such a generator. The generator successively increments the 
variable k and, for each k, first outputs (0,) and then all the remaining pairs up to and including 
(k,0), where each pair is constructed from the previous one by incrementing its first component 
and decrementing the second component by 1. 

This was but an intuitive description of the generator’s algorithm (program). However, accord- 
ing to the Computability Thesis, there exists an actual Turing machine performing all the described 
tasks. We denote this Turing machine by Gyp. It is easy to see that, for arbitrary i,j € N, the 
machine Gy generates every (i, j) exactly once. 

How will Gy be used? With the pair generator Gyp, the function f, and the recognizer Rs, we 
can now construct an improved generator Gg of the set S. (See Fig. 6.21b.) This generator repeats 
the following four steps: 1) it demands from Gy the next pair (i, j); 2) it generates an element 
x€U by computing x:= f(i); 3) it demands from Rg the answer to the question xE?S in exactly 
j steps; 4) if the answer is YES (i.e., xe S), then Gs outputs (generates) x; otherwise it generates 
nothing and returns to 1). 
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6.3.5 The Standard Universes ~* and N 


In the rest of the book, the universe U/ will be either X*, the set of all the words over 
the alphabet Y, or N, the set of all natural numbers. In what follows we show that 
both X* and N are c.e. sets. This will closely link the tasks of set generation and set 
recognition. 


Theorem 6.4. * and N are c.e. sets. 


Proof. We intuitively describe the generators of the two sets. 

a) The generator Gy will output words in shortlex order (i.e., in order of increasing length, 
and, in the case of equal length, in lexicographical order; see Appendix A, p. 369). For example, 
for Y = {a,b,c}, the first generated words are 


€, 
a,b,c, 
aa,ab,ac,ba,bb,bc,ca,cb,cc, 
aaa,aab,aac,aba,abb,abc,aca,acb,acc,baa,bab,bac,bba,bbb,bbc,bca,bcb,bcc,caa,cab,cac,cba,cbb,cbc,cca,ccb,ccc, 


To generate the words of length 0+ 1, Gy« does the following: For each previously generated word 
w of length @, it outputs the words ws for each symbol s € Z. 

b) The generator Gy will generate binary representations of natural numbers n = 0,1,.... To 
achieve this, it operates as follows. First, it generates the two words of length @ = 1, that is, 0 and 
1. Then, to generate all the words of length @+ 1, it outputs, for each previously generated word w 
of length @ that starts with 1, the words w0 and wl. (In this way the words with leading Os are not 
generated.) For example, the first generated binary representations and the corresponding natural 
numbers are: 


0,1, 0,1, 
10,11, 2,3, 
100, 101,110,111, 4,5,6,7, 


1000, 1001, 1010, 1011, 1100, 1101, 1110,1111 8,9, 10, 11,12, 13,14, 15, 


The existence of the Turing machines Gy» and Gy is assured by the Computability Thesis. 


Combining Theorems 6.2, 6.3, and 6.4 we find that when the universe is 2* or N, 
set generation and set recognition are closely linked tasks. 


Corollary 6.2. Let the universe U be X* or N. Then: 
A set S is semi-decidable inUu <> S is c.e. 


Remark. Since in the following the universe will be either X* or N, the corollary allows us to use 
the adjectives semi-decidable and computably enumerable as synonyms. Similarly to the pairs of 
adjectives decidable-computable and undecidable-incomputable, we will use them in accordance 
with our wish to stress the amenability of a set of interest to recognition or generation. 
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6.3.6 Formal Languages vs. Sets of Natural Numbers 


In this section we show that it is not just by chance that both X* and N are c.e. sets. 
This is because there is a bijective function from L* to N. We prove this in Box 6.4. 


Theorem 6.5. There is a bijection fis, : X* +N. 


Box 6.4 (Proof). 


We prove the theorem for the alphabet © = {ao,a;} and then for a general Y = {ao,a1...,ap_1}. 

a) Let © = {ag,a)}. Imagine that ag and a; represent the numbers 0 and 1, respectively. In 
general, we will say that u € {ao,ai}* represents a natural number if u has no leading symbols ao 
(with the exception of u = ag). For example, aga; represents no natural number. We can now 
define the function #2 : {ao,ai}* > N as follows: If u € {ao,a1}* represents a natural number, 
then let #(u) be that number. For example, #2(a;aoa,) =5 while #2 (aga; ) is undefined. Now let 
w € {ao, a1 }* be an arbitrary word. We associate w with a natural number f(w), where 


fo(w) © #o(aw) — 1. 


For example, f2(€)=0; f2(ao)=1; fo(a1) =2; fo (aoao) =3; fo(aoa1) =4; fo(a1ao)=5; fo(aiai) =6. 
Next, we prove that fo : {ao,a1}* — N is bijective. The function fo is injective: If w; Awe, then 
a1 W1 £41 Wo, 80 #2(ayw1)—1 A #2(a1w2)—1, and fo(wi) # f(w2). The function fo is surjective: If 
nis an arbitrary natural number, then n= f>(w), where w is obtained from the binary code of n+1 
by canceling the leading symbol (which is a). For example, for n = 4, the binary code of n+ 1 is 
a\aga1, so w = 01. Thus, f2(01) = 4. 

b) Let Y = {ag,a)...,a p—1} be an arbitrary alphabet. If we interpret each a; as the number i, 
then we can define that a word u € {ao,a1,...,@p—1}* represents a natural number if u has no 
leading symbols ap (with the exception of u = ag). In other words, u represents a natural number 
if uw is the code of a natural number in the positional number system with base p. Now define 


the function #, : {ag,a1,...,4p—1}* + Nas follows: Ifu € {ao,a),...,@p—1}* represents a natural 
number, then let #,(u) be that number. For example, if p = 3 and u = azq1, then #3(a2a,) =2- 3l+ 
1-3° = 7. Consider the word u=agag ...a € {40,41,---,4p-1 \*, Then a,u represents the number 


#, (au) =#p(a1agdo .-.ao) = p!"|, In canonical (i.e., shortlex) order there are p + p! +... + plul-! 


words preceding the word u. Assuming that the searched-for function f, maps these words in 


‘ 2 = lee] 
the same order to consecutive numbers 0,1,..., then u is mapped to the number ree : p= ee 
Finally, let w € {ao,a1,...,ap—1}* be an arbitrary word. The function f, must map w to the number 

he] ‘ ‘ 
a +#,(aw) —#p(a1 ao...ao). After rearranging we obtain 
“Se 
|w| pil = 
fp(w) = tp(arw) — pl +P 


Take, for example, © ={ag,a1,a2}. Then w = a2a is mapped to f3(a2a1) = #3(aja2a1) — 34 g = 
16—9+4=11. The function f, : Y* — N is bijective. We leave the proof of this as an exercise. 


A subset of the set X* is said to be a formal language over the alphabet Y. The 
above theorem states that every formal language S C 2* is associated with exactly 
one set fis) (S) C N of natural numbers—and vice versa. How can we use this? The 
answer is given in the following remark. 
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NB When a property of sets is independent of the nature of their elements, we are 
allowed to choose whether to study the property using formal languages or sets 
of natural numbers. The results will apply to the alternative, too. For Computabil- 
ity Theory, three properties of this kind are especially interesting: the decidability, 
semi-decidability, and undecidability of sets. We will use the two alternatives based 
on the context and the ease and clarity of the presentation. 


6.4 Chapter Summary 


In addition to the basic model of the Turing machine there are several variants. Each 
is a generalization of the basic model in some respect. Nevertheless, the basic model 
is capable of computing anything that can be computed by any other variant. 

Turing machines can be encoded by words consisting of Os and 1s. This enables 
the construction of the universal Turing machine, a machine that is capable of sim- 
ulating any other Turing machine. Thus, the universal Turing machine can compute 
anything that can be computed by any other Turing machine. Practical consequences 
of this are the existence of the general-purpose computer and the operating system. 

RAM is a model of computation that is equivalent to the Turing machine but is 
more appropriate in Computational Complexity Theory, where time and space are 
bounded computational resources. 

Turing machines can be enumerated and generated. This allows us to talk about 
the nth Turing machine, for any natural n. The Turing machine can be used as a 
computer (to compute values of a function), or as a generator (to generate elements 
of a set), or as a recognizer (to find out which objects are members of a set and 
which are not). 

A function @ : A > B is said to be partial computable on A if there exists a 
Turing machine capable of computing the function’s values wherever the function 
is defined. A partial computable (p.c.) function @ : A —> B is said to be computable 
on A if it is defined for every member of A. A partial function is incomputable if 
there exists no Turing machine capable of computing the function’s values wherever 
the function is defined. 

A set is decidable in a universe if the characteristic function of the set is com- 
putable on the universe. A set is undecidable in a universe if the characteristic func- 
tion of the set is incomputable on the universe. A set is semi-decidable if the charac- 
teristic function of the set is computable on this set. A set is computably enumerable 
(c.e.) if there exists a Turing machine capable of generating exactly the members of 
the set. 

There is a bijective function between the sets Y* and N; so instead of studying 
the decidability of formal languages, we can study the decidability of sets of natural 
numbers. 
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Problems 


6.1. Given Q and 5, how many reduced TMs T = (Q, {0,1}, {0, 1,1},5,¢1,L4, {g2}) are there? 


6.2. Informally design an algorithm that restores Y,I",Q, and F from the code (T) of a basic TM. 


6.3. Informally design the following Turing machines: 


(a) 


(b) 
(c) 


(d) 
) 


(f) 


(g) 


(h) 


@ 


Q) 


a basic TM that accepts the language {0”1"|n > 1}; 
[Hint. Repeatedly replace the leftmost 0 by X and the rightmost | by Y.] 


a TM that accepts the language {0"1"0"|n > 1}; 


a basic TM that accepts an input if its first symbol does not appear elsewhere on the input; 
[Hint. Use a finite-storage TM. ] 


a TM that recognizes the set of words with an equal number of Os and 1s; 


a TM that decides whether or not a binary input greater than 2 is a prime; 

[Hint. Use a multi-track TM. Let the input be on the first track. Starting with the number 1, 
repeatedly generate on the second track the next larger number less than the input; 

for each generated number copy the input to the third track and then subtract the 
second-track number from the third-track number as many times as possible. ] 


a TM that recognizes the language {wew|w € {0,1}*}; 
[Hint. Check off the symbols of the input by using a multi-track TM.] 


a TM that recognizes the language {ww|w € {0,1}*}; 
[Hint. Locate the middle of the input.] 


a TM that recognizes the language {ww*® | w € {0, 1}*} of palindromes; 
[Hint. Use a multi-tape TM.] 


a TM that shifts its tape contents by three cells to the right; 
(Hint. Use a finite-storage TM to perform caterpillar-type movement. ] 


Adapt the TM from (i) to move its tape contents by three cells to the left. 


6.4. Prove: A nondeterministic d-dimensional tp-tape tk-track TM can be simulated by a basic TM. 


6.5. Prove: 


(a) 
(b) 
(c) 
(d) 


A set is c.e. iff it is generated by a TM. 
If a set is c.e. then there is a generator that generates each element in the set exactly once. 
A set is computable iff it is generated by a TM in shortlex order. 


Every c.e. set is accepted by a TM with two nonaccepting states and one accepting state. 


6.6. Informally design TMs that compute the following functions (use any version of TM): 


(a) 


consti (n},...,7) 2 i, for 7 >Oandk > 1; 


(b) add(m,n) = m+n; 


() 


mult(m,n) = mn; 
(Hint: start with 010", put 1 after it, and copy 0” onto the right end m times.] 
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(d) power(m,n) Sn": 


def 


(e) fact(n) =n}; 


m 


(f) tower(m,n) = m"™ \n levels; 


det _ if 2 5 
(g) minus(m,7) th re elie aera if 
0 otherwise. 
(h) div(m,n) Smtn= els 
(i) floorlog(n) © |log, n|; 

k times 
OT OF 

(j) log*(n) = the smallest k such that log(log(--- (log(n))---)) <1; 
(k) gced(m,n) = greatest common divisor of m and n; 
() Icm(m,n) © least common multiple of m and n; 
(m) prime(n) © the nth prime number; 
(n) x(x) © the number of primes not exceeding x; 
(0) $(n) © the number of positive integers that are < n and relatively prime to n (Euler funct.); 
(p) max*(n1,..., 7x) = max{ny,...,ne}, fork >1; 


at JO ifx 1; 
© neo feo: 


(s) and(x,y) 240 877" 49 
0 otherwise. 


ate Jl ifx>1Vy21; 
t) or(x,y) = f 
(y (9) 1 otherwise. 


le ifx > 1; 
(u) if-then-else(x, y,z) - f a ; 
z otherwise. 


0 otherwise. 


1 ifx>y; 
0 ifx<y. 


def 


v) cals) 44 pied 
(w) gr(x,y) = 


ar Jl ifx2y; 
0 ifx<y. 
1 


6.4 Chapter Summary 151 


«fl ifx<y; 
1 . def SIs 
ey i ifx>y. 


6.7. Prove the following Basic Theorem, which provides an alternative characterization of c.e. sets. 


Theorem 6.6. (Basic Theorem on C.E. Sets) A set S is c.e. <> S is the domain of a p.c. function. 


[Hint.(=) Let S be c.e. We will use Corollary 6.1, p. 140. If S = 0, then S = dom(@), where @ is 
the everywhere undefined p.c. function. Otherwise, S = rng(f) of a computable function f. Define 
ap.c. function @ as follows: @(x) :=.x if x eventually appears in f(0), f(1), f(2),...5 else p(x) :=f. 
Then S = dom(@). 

(<=) Let S = dom(@) where @ is a p.c. function. The set dom(@) is semi-decidable. (Given an x, 
the recognizer Rgom(g) Starts computing p(x) and, if the computation terminates, answers YES.) 
Now apply Theorem 6.3, p. 145.] 


Remark. There is a similar statement about the range of a p.c. function (see Problem 7.2c, p. 171). 


Definition 6.11. (Pairing Function) A pairing function is any computable bijection f : N? > N 
whose inverse functions f, - fy |, defined by f(f;'(n), fs '(n)) =n, are computable. Therefore, 
HG&M=2 iff, (n) =iand fy (n) = j. The standard (or Cantor) pairing function p : N? + N 
is defined by 


a mae Wine 


pli. j) sli + f)(i+j+1) +i. 


6.8. Prove: The function p(i, j) is a pairing function with inverse functions 


1 
py (n) =i= n—sw(w+1) and 
py'(n) =j=w-i 
where 


— 


ey 


6.9. What is the connection between p and the generation of pairs in Fig. 6.22 (p.145)? 


6.10. We can use pairing functions to define bijective functions that map Né onto N. 
(a) Describe how we can use p from Prob. 6.8 to define and compute a bijection p°) :N? +N? 


(b) Can you find the corresponding inverse functions pot ; pot : po! ? 


(c) How would you generalize to bijections from N* onto N, where k > 3? 
ye Define p“) :N‘ +N by p (i1,...,i¢) = p(p4—) (it, ...,i¢-1), ix) for k > 3, and 
p=p.] 


6.11. Informally design the following generators (use any version of TM): 
(a) Gys, the generator of 3-tuples (i), i2,13) € N3; 
(b) Gyx, the generator of k-tuples (i),..., ig) € N* (for k > 3); 


(c) Gg, the generator of rational numbers a 
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Remark. (On the new terminology in Computability Theory.) The terminology in the papers of the 
1930s is far from uniform. This is not surprising, though, as the theory was just born. Initially, par- 
tial recursive function was used for any function constructed using the rules of Gédel and Kleene 
(see p. 81). However, after the Computability Thesis was accepted, the adjective “partial recursive” 
expanded to eventually designate any computable function, regardless of the model of computa- 
tion on which the function was defined. At the same time, the Turing machine became widely ac- 
cepted as the most convincing model of computation. But the functions computable by Turing ma- 
chines do not exhibit recursiveness (i.e., self-reference) as explicitly as the functions computable in 
Gédel-Kleene’s model of computation. (We will explain this in the Recursion Theorem, Sect. 7.4.) 
Consequently, the adjective “partial recursive” lost its sense and was groundless when the sub- 
jects under discussion were Turing-computable functions. A more appropriate adjective would be 
“computable” (which was used by Turing himself). Nevertheless, with time the whole research 
field took the name Recursion Theory, in spite of the fact that its prime interest has always been 
computability (and not only recursion). 

In 1996, Soare!? proposed corrections to the terminology so that notions and concepts would 
regain their original meanings, as intended by the first researchers (see Soare [242]). In summary: 


e the term computable and its variants should be used in connection with notions: computation, 
algorithm; Turing machine, register machine; function (defined on one of these models), set 
(generated or recognized on one of these models); relative computability; 

e the term recursive and its variants should be used in connection with notions: recursive (in- 
ductive) definition; (general) recursive function (Herbrand-Gédel), primitive recursive and 
u-recursive function (Gédel-Kleene) and some other notions from the theory of recursive 
functions. 


13 Robert Irving Soare, b. 1940, American mathematician. 


ye 
Chapter 7 Aipsates 
The First Basic Results 


Recursion is a method of defining objects in which the object 
being defined is applied within its own definition. 


Abstract In the previous chapters we have defined the basic notions and concepts 
of a theory that we are interested in, Computability Theory. In particular, we have 
rigorously defined its basic notions, i.e., the notions of algorithm, computation, and 
computable function. We have also defined some new notions, such as the decidabil- 
ity and semi-decidability of a set, that will play key roles in the next chapter (where 
we will further develop Computability Theory). As a side product of the previous 
chapters we have also discovered some surprising facts, such as the existence of the 
universal Turing machine. It is now time to start using this apparatus and deduce 
the first theorems of Computability Theory. In this chapter we will first prove sev- 
eral simple but useful theorems about decidable and semi-decidable sets and their 
relationship. Then we will deduce the so-called Padding Lemma and, based on it, 
introduce the extremely important concept of the index set. This will enable us to de- 
duce two influential theorems, the Parameter Theorem and the Recursion Theorem. 
We will not be excessively formal in our deductions; instead, we will equip them 
with meaning and motivation wherever appropriate. 


7.1 Some Basic Properties of Semi-decidable (C.E.) Sets 


For starters, we prove in this section some basic properties of semi-decidable sets. 
In what follows, A, B, and S are sets. 


Theorem 7.1. S is decidable => S is semi-decidable 


Proof. This is a direct consequence of Definition 6.10 (p. 142) of (semi-)decidable sets. 
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Theorem 7.2. S is decidable —> S is decidable 


Proof. The decider Dz starts Ds and reverses its answers. 


The next theorem is due to Post and is often used. 


Theorem 7.3. (Post’s Theorem) S and S are semi-decidable <> S is decidable 


Proof. Let S and S be semi-decidable sets. Then, there are recognizers Rs and Rg. Every x eu 
is a member of either S or S; the former situation can be detected by Rs and the latter by Rg. So 
let us combine the two recognizers into the following algorithm: 1) Given x € U, simultaneously 
start Rg and Rg on x and wait until one of them answers YES. 2) If the YES came from Rg, output 
YES (i.e., x € S) and halt; otherwise (i.e., the YES came from Rg) output NO (i.e., x ¢ S) and halt. 
This algorithm decides, for arbitrary x € U/, whether or not x is in S. Thus, the algorithm is Dg, a 
decider for S. 


Theorem 7.4. S is semi-decidable <> S is the domain of a computable function 


Proof. If S is semi-decidable, then it is (by Definition 6.10 on p. 142) the domain of the char- 
acteristic function 7s, which is computable on S. Inversely, if S is the domain of a computable 
function @, then the characteristic function, defined by 75 (x) =1 <= (x), is computable on S 
(see Definition 6.6 on p. 136). Hence, S is semi-decidable. 


Theorem 7.5. 


A and B are semi-decidable = AUB and A(B are semi-decidable 
A and B are decidable = AUB and AB are decidable 


Proof. a) Let A and B be semi-decidable sets. Their characteristic functions 74 and 7, are 
computable on A and B, respectively. The function ¥ 4ug, defined by 


def 1, if A(x) =1V xXB(x) = 1, 
%auB(*) = { 0, otherwise 


is the characteristic function of the set AU B and is computable on AUB. Hence, AUB is semi- 
decidable. Similarly, the function 7 4nz, defined by 


def 1, if XA(x) = 1A XB (x) = 1, 
Hane(x) = { 0, otherwise 


is the characteristic function of the set AM B and is computable on AMB. So, ANB is semi- 
decidable. 

b) Let A and B be decidable sets. Then, 7.4 and 7p are computable functions on Z/. Also, the 
functions 7 4uB and Zang are computable on U/. Hence, AUB and ANB are decidable sets. 
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7.2 Padding Lemma and Index Sets 


We have seen in Sect. 6.2.1 that each natural number can be viewed as the index 
of exactly one Turing machine (see Fig. 7.1). Specifically, given an arbitrary e € N, 
we can find the corresponding Turing machine T, = (Q,2,I',6,q1,.4,F) by the 
following algorithm: 


if the binary code of e is of the form 111K, 11K 11...11K,111, where each 
K is of the form 0'10/1010'10” for some i, j,k, £,m, 
then determine 6,Q0,2,I',F by inspecting K,,K2,...,K, and taking into account 
that K = 0'10/10'10°10” encodes the instruction 5(q;,z;) = (¢x,20,Dm); 
else T. is the empty Turing machine. 


But remember that the Turing machine 7, is implicitly associated, for each k > 1, 


with a k-ary partial computable function oh) , the k-ary proper function of T. (see 


Sect. 6.3.1). Consequently, each e € N is the index of exactly one k-ary partial com- 
putable function for any fixed k > 1. See Fig. 7.1. 


Fig. 7.1 Each natural number 
e is the index of exactly one 
partial computable function 


(denoted by o) 


Padding Lemma 


What about the other way round? Is each Turing machine represented by exactly 
one index? The answer is no; a Turing machine has several indexes. To see this, let 
T be an arbitrary Turing machine and 


(T) =111 Ky, 11Ky11...11K,111 


its code. Here, each subword K encodes an instruction of T’s program 6. Let us now 
permute the subwords K,, K2,...,K, of (T). Of course, we get a different code, but 
notice that the new code still represents the same Turing program 6, i.e., the same set 
of instructions, and hence the same Turing machine, 7. Thus, T has several different 
indexes (at least r!, where r is the number of instructions in 6). 

Still more: We can insert into (T) new subwords K,+41,K;+2,..., where each 
of them represents a redundant instruction, i.e., an instruction that will never be 
executed. By such padding we can construct an unlimited number of new codes 
each of which describes a different Turing program. But notice that, when started, 
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each of these programs behaves (i.e., executes) in the same way as T’s program 6. 
In other words, each of the constructed indexes defines the same partial computable 
function, T’s proper function. Formally: If e is the index of T and x an arbitrary 
index constructed from e as described, then @, ~ @.. (See Fig. 7.2.) 

We have just proved the so-called Padding Lemma. 


Lemma 7.1. (Padding Lemma) A partial computable function has countably 
infinitely many indexes. Given one of them, countably infinitely many others 
can be generated. 


Fig. 7.2 Each p.c. function i 


(denoted by os”) has count- 
ably infinitely many indexes 


Index Set of a Partial Computable Function 


There may exist Turing machines whose indexes cannot be transformed one to 
another simply by permuting their instructions or by instruction padding, and yet the 
machines compute the same partial computable function (i.e., solve the same prob- 
lem). For instance, we described two such machines in Examples 6.1 and 6.2 (see 
pp. 115,116). The machines compute the sum of two integers in two different ways. 
In other words, the machines have different programs but they compute the same 
partial computable function g?) : N? — N, that is, the function @) (m,n) =m-+n. 
We say that these machines are equal in their global behavior (because they com- 
pute the same function) but they differ in their local behavior (as they do this in 
different ways). 

All of this leads in a natural way to the concept of the index set of a partial com- 
putable function. Informally, the index set of a p.c. function consists of the indexes 
of all the Turing machines that compute this function. (See Fig. 7.3.) So let @ be 
an arbitrary p.c. function. Then there exists at least one Turing machine T capable 
of computing the values of @. Let e be the index of T. Then @ ~ @, where @, is 
the proper function of T (having the same number of arguments as Q, i.e., the same 
k-arity). Now let us collect all the indexes of all the Turing machines that compute 
the function @. We call this set the index set of @ and denote it by ind(@). 
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Definition 7.1. (Index Set) The index set of a p.c. function @ is the set 
ind(p) = {x EN | Ox ~ 9}. 


Informally, ind(@) contains all the (encoded) Turing programs that compute @. Tak- 
ing into account the Computability Thesis, we can say that ind(@) contains all the 
(encoded) algorithms that compute (values of) the function @. There are countably 
infinitely many algorithms in ind(@). Although they may differ in their local behav- 
ior, they all exhibit the same global behavior. 


Fig. 7.3 Index set ind(@) of a 
p.c. function @ 


Index Set of a Semi-decidable Set 


We can now in a natural way define one more useful notion. Let S be an arbitrary 
semi-decidable set. Of course, its characteristic function 7s is partial computable. 
But now we know that the index set ind(¥s) of the function ¥s5 contains all the 
(encoded) algorithms that are capable of computing 7s. In other words, ind(ys) 
consists of all the (encoded) recognizers of the set S. For this reason we also call 
ind(Ys) the index set of the set S and denote it by ind(S). 


7.3 Parameter (s-m-n) Theorem 


Consider an arbitrary multi-variable partial computable function @. Select some of 
its variables and assign arbitrary values to them. This changes the role of these 
variables, because the fixed variables become parameters. In this way, we obtain a 
new function y of the rest of the variables. The Parameter Theorem, which is the 
subject of this section, tells us that the index of the new function y depends only on 
the index of the original function @ and its parameters; what is more, the index of 
y can be computed, for any @ and parameters, with a computable function s. 

There are important practical consequences of this theorem. First, since the index 
of a function represents the Turing program that computes the function’s values, the 
Parameter Theorem tells us that the parameters of @ can always be incorporated into 
the program for @ to obtain the program for y. Second, since parameters are natural 
numbers, they can be viewed as indexes of Turing programs, so the incorporated 
parameters can be interpreted as subprograms of the program for yw. 
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7.3.1 Deduction of the Theorem 


First, we develop the basic form of the theorem. Let @,(y,z) be an arbitrary partial 
computable function of two variables, y and z. Its values can be computed by a Tur- 
ing machine 7;,. Let us pick an arbitrary natural number—call it y—and substitute 
each occurrence of y in the expression of @,(y,z) by y. We say that the variable y has 
been changed to the parameter y. The resulting expression represents a new func- 
tion ~,(¥,z) of one variable z. Note that @,(¥,z) is a partial computable function 
(otherwise @,(y,z) would not be partial computable). Therefore, there is a Turing 
machine—call it T,— that computes @,(Y,z). (At this point e is not known.) Since @, 
is the proper function of T., we have ,(¥,z) = @(z). Now, the following questions 
arise: “What is the value of e? Can e be computed? If so, how can e be computed?” 

The next theorem states that there is a computable function, whose value is e, for 
the arbitrary x,y. 


Theorem 7.6. (Parameter Theorem) There is injective computable function 
s:N? SN such that, for every x,y € N, 


Ox (¥,z) ia ®s(x,5) IE 


Proof idea. Let x,y € N be given. First, we conceive a Turing machine T that oper- 
ates as follows: Given an arbitrary z as the input, T inserts y before z and then starts 
simulating Turing machine 7; on the inputs y,z. Obviously, T computes the value 
©, (¥,z). We then show that such a T can be constructed for any pair x,y. Therefore, 
the index of the constructed T can be denoted by s(x,¥), where s is a computable 
function mapping N? into N. The details of the proof are given in Box 7.1. 


Box 7.1 (Proof of the Parameter Theorem). 


Let us describe the actions of the machine T, that will entail the equality @,(y,z) = @(z) for every 
x,y €N. Initially, there is only z written on 7.’s tape. Since y will also be needed, 7. prepares it 
on its tape. To do this, it shifts z to the right by y+ 1 cells, and writes unary-encoded y (and a 
separator symbol) to the emptied space. After this, 7, starts simulating T,’s program on inputs y 
and z and thus computing the value @,(y,z). Consequently, @(¥,z) = Q(z). 

But where is the function s? It follows from the above description that the program P, of the 
machine 7, is a sequence P;;P) of two programs P; and P) such that P; inserts y on the tape and 
then leaves the control to P), which is responsible for computing the values of the function @,(y,z). 
Hence, (P.) = (P;P)). Now we see the role of s: The function s must compute (P;; P)) for arbitrary 
x and y. To do this, s must perform the following tasks (informally): 


1. Construct the code (P;) from the parameter y. 
(Note that P; is simple: It must shift the contents of the tape y+ 1 times to the right and write 
the symbol | to the first ¥ of the emptied cells.) 

2. Construct the code (P2) from the index x. 
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3. Construct the code (P;;P) from (P;) and (P>). 
(To do this, take the word (P,)(P:) and change it so that, instead of halting, P; starts the first 
instruction of P».) 

4. Compute the index e from (P;;P) and return s(x, y) :=e. 


All the tasks are computable. So, s is a computable function. It is injective, too. 


Next, we generalize the theorem. Let the function @, have m > 1 parameters and 
n > 1 variables. That is, the function is @,(¥,,..-,¥mn5Z1,---;Zn)- The proof of the 
following generalization of the Parameter Theorem uses the same ideas as the pre- 
vious one, so we omit it. 


Theorem 7.7. (s-m-n Theorem) For arbitrary m,n > | there is an injective 
computable function s™ : N’"+! — N such that, for every x,¥1,---;3m €N, 


Qx(V1,-- SU ral mere at = Raia oe Wale wats Zn). 


Summary. If the variables y,,...,y, of a function @, are assigned fixed values 
V1,---;Ym> Fespectively, then we can build the values in 7,’s Turing program and 
obtain a Turing program for computing the function @ of the rest of the variables. 
The new program is represented by the index s7"(x,¥),...,¥,,), Where si” is an in- 
jective and computable function. It can be proved that s’” is a primitive recursive 
function. 


7.4 Recursion (Fixed-Point) Theorem 


The construction of the universal Turing machine brought, as a byproduct, the cog- 
nizance that a Turing machine can compute with other Turing machines simply 
by manipulating their indexes and simulating programs represented by the indexes. 
But, can a Turing machine manipulate its own index and consequently compute with 
itself? Does the question make sense? The answer to both questions is yes, as will 
be explained in this section. Namely, we have seen that there is another model of 
computation, the L1-recursive functions of Gédel and Kleene (see Sect. 5.2.1), that 
allows functions to be defined and hence computed by referring to themselves, i.e., 
recursively (see Box 5.1). Since the Turing machine and u-recursive functions are 
equivalent models of computation, it is reasonable to ask whether Turing machines 
also allow for, and can make sense of, recursiveness (1.e., self-reference). That this 
is actually so follows from the Recursion Theorem, which was proved by Kleene in 
1938. The theorem is also called the Fixed-Point Theorem. 
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7.4.1 Deduction of the Theorem 


1. Let i € N be an arbitrary number and f : N > N an arbitrary computable function. 


2. Definition. Given i and f, let T be a TM that performs the following steps: 


a. T\ has two input data: i (as picked above) and an arbitrary x; 

b. T!) interprets input 7 as an index and constructs the TP of 7;; 

c. T simulates 7; on input i; /li., T tries to compute @;(i) 
d. if the simulation halts, then T performs steps (e)—-(g): Hice., if 0; (i) + 
e. it applies the function f to the result g;(i);— //i.e., 7“ computes f(@;(i)) 
fit interprets f(@;(7)) as an index and constructs the TP of Ty(9,(;)); 


g. it simulates Ty(g7)) on input x. // In summary: T) computes Py(¢g,(i)) (x) 


We have seen that 7“ on input i,x computes the value Pr(@,(i)) (*)- Thus, the 
proper function of T() depends on input i. 


3. But, the steps a—g are only a definition of the machine T). (We can define what- 
ever we want.) So the question arises: “For which i € N does To) actually exist?” 
And what is more: “For which i € N can T“) be constructed?” 


Proposition. T) exists for every i € N. There is an algorithm that constructs 
rae for arbitrary i. 


Proof. We have already proved: i) every natural number is an index of a TM; ii) every index 
can be transformed into a Turing program; and iii) every TM can be simulated by another TM. 
Therefore, each of the steps (a)-(g) is computable. Consequently, there is, for every i € N, a 
Turing machine that executes steps (a)—-(g). We call this machine TO, 

Can T) be constructed by an algorithm? To answer in the affirmative, it suffices to describe 
the algorithm with which we construct the program of T‘). Informally, we do this as follows: 
We write the program of T) as a sequence of five calls of subprograms, a call for each of the 
tasks (b), (c), (e), (f), (g). We need only three different subprograms: (1) to convert an index to 
the corresponding Turing program (for steps b, f); (2) to simulate a TM (for steps c, g); and (3) 
to compute the value of the function f (for step e). We already know that subprograms (1) and 
(2) can be constructed (see Sect. 6.2). The subprogram (3) is also at our disposal because, by 
assumption, f is computable. Thus, we have informally described an algorithm for constructing 
the program of T“), Oo 


4. According to the Computability Thesis, there is a Turing machine T; that does 

the same thing as the informal algorithm in the above proof, i.e., 7; constructs 

T for arbitrary i € N. But this means that 7; computes a computable function 

that maps N into the set of all Turing raiichinies (or rather, their indexes). In other 
words, 7;’s proper function @; is total. 


Consequence. There is a computable function 9; : N + N such that Ti) = T9j(i)- 
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Note that @; does not execute steps a—g; it only returns the index of T), which 
does execute these steps. 
5. The unary proper function of T 9 (i) is denoted by Po j(i)- Applied to x, it com- 
putes the same value as Ty, = T) in steps a—g, that is, the value PF(9,(i)) (x). 
Consequently, 


Poi(*) = Pro) )- 
6. Recall that i is arbitrary. So, let us take i:= j. After substituting i by 7, we obtain 


Poi) = Pp(oi()) 
and, introducing n:= @;(j), we finally obtain 


n(x) = Pein) (2). 


This equation was our goal. Recall that in the equation f is an arbitrary computable 
function. We have proven the Recursion Theorem. 


Theorem 7.8. (Recursion Theorem) For every computable function f there is 
a natural number n such that Qn ~ fn). The number n can be computed 
from the index of the function f. 


7.4.2 Interpretation of the Theorem 


What does the Recursion Theorem tell us? First, observe that from @, ~ P(n) it does 
not necessarily follow that n = f(n). For example, f defined by f(k) =k+ lisa 
computable function, yet there is no natural number n such that n = f(n). Rather, 
the equality @, ~ QF) states that two partial computable functions, whose indexes 
are n and f(n), are equal (i.e., @, and Pf(n) have equal domains and return equal 
values). 

Second, recall that the indexes represent Turing programs (and hence Turing ma- 
chines). So we can interpret f as a transformation of a Turing program represented 
by some i € N into a Turing program represented by f(i). Since f is supposed to be 
total, it represents a transformation that modifies every Turing program in some way. 
Now, in general, the original program (represented by 7) and the modified program 
(represented by f(i)) are not equivalent, i.e., they compute different proper func- 
tions @; and @,,;). This is where the Recursion Theorem makes an entry; it states 
that there exists a program (represented by some n) which is an exception to this 
rule. Specifically: 


If a transformation f modifies every Turing program, then 
some Turing program n is transformed into an equivalent Turing program f(n). 
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In other words, if a transformation f modifies every Turing machine, there is al- 
ways some Turing machine T,, for which the modified machine T;,,) computes the 
same function as T,,. Although the Turing programs represented by n and f(n) may 
completely differ in their instructions, they nevertheless return equal results, i.e., 
compute the same function 9 (~ @, ~ Pr (n))- We have said that such machines dif- 
fer in their local behavior but are equal in their global behavior (see p. 158). 

Can we find out the value of n, i.e., the (index of the) Turing program that is 
modified by f into the equivalent program? According to step 6 of the deduction we 
have n = 9;(j), while according to step 3 j can be computed and depends on the 
function f only. So n depends only on f. Since f is computable, n can be computed. 


7.4.3 Fixed Points of Functions 


The Recursion Theorem is also called the Fixed-Point Theorem, because the number 
n for which @, = Qn) has become known as a fixed point of the function f | Using 
this terminology we can restate the theorem as follows. 


Theorem 7.9. (Fixed-Point Theorem) Every computable function has a fixed point. 


(We will use this version in Sect. 9.3.) Now the following question naturally arises: 
Is the number 7 that we constructed from the function f the only fixed point of f? 
The answer is no. Actually, there are countably infinitely many others. 


Theorem 7.10. A computable function has countably infinitely many fixed points. 


Proof. Assume that there exists a computable function f with only finitely many fixed points. 
Denote the finite set of f’s fixed points by F. Choose any partial computable function @, with the 
property that none of its indexes is in F, i.e., @. # , for every x € F. (In short, ind(@,) NF = 0.) 
Now comes the tricky part: Let g : N > N be a function that is implicitly defined by 


J Ge, ifxe F; 
Pe(x) = Prix), otherwise. 


Although g is not explicitly defined, we can determine the following two relevant properties of g: 


1. The function g is computable. The algorithm to compute g(x) for an arbitrary x has two steps: 
(a) Decide x €?F. (This can always be done because F is finite.) (b) If x € F, then g(x) := e; 
otherwise, compute f(x) and assign it to g(x). (This can always be done as f is computable.) 


2. The function g has no fixed point. If x € F, then Q,(,) ~ Pe (by definition) and p, % @, (property 
Of Pe), 80 Po(x) % Px. This means that none of the elements of F is a fixed point of g. On the 
other hand, if x ¢ F, then Qg(,) ~ P(x) (by def.) and Py(y) F Py (as x is not a fixed point of f), 


which implies that @,(,) % @,. This means that none of the elements of F is a fixed point of g. 


So, g is a computable function with no fixed points; this contradicts the Fixed-Point Theorem. 


! In fact, what remains fixed under f is the function @, (or global behavior of 7,,), not the index n. 
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Consequently, we can upgrade the interpretation of the Fixed-Point Theorem: 


If a transformation f modifies every Turing program, then countably infinitely 
many Turing programs are transformed into equivalent Turing programs. 


7.4.4 Practical Consequences: Recursive Program Definition 


The Recursion Theorem allows a partial computable function to be defined using its 
own index, as described in the following schematic definition: 


On the left-hand side there is a function @;, which is being defined, and on the right- 
hand side there is a Turing program P, i.e., a sequence [...i...x...] of instructions, 
describing how to compute the value of @; at the argument x. Consider the variable 
iin P. Obviously, i is also the index of the function being defined, 9;. But, an in- 
dex of a function is just a description of the program computing that function (see 
Sect. 6.2.1). Thus, it looks as if the variable i in P describes the very program P, 
and hence the index i of the function being defined. But this is a circular definition, 
because the object being defined (i.e., i) is a part of the definition! 

On the other hand, we know that pt-recursive functions, i.e., the Gddel-Kleene 
model of computation, allow constructions, and hence definitions, of functions in a 
self-referencing manner by using the rule of primitive recursion (see Box 5.1, p. 82). 
As a Turing machine is an equivalent model of computation, we expect that it, too, 
allows self-reference. To prove that this is indeed so, we need both the Recursion 
Theorem and the Parameter Theorem. Let us see how. 

Notice first that the variables i and x have different roles in P. While x repre- 
sents an arbitrary natural number, i.e., the argument of the function being defined, 
the variable i represents an unknown but fixed natural number, i.e., the index of this 
function. Therefore, i can be treated as a (yet unknown) parameter in P. Now, view- 
ing the program P as a function of one variable x and one parameter i, we can apply 
the Parameter Theorem and move the parameter i to the index of some new function, 
@5(i) (x). Observe that this function is still defined by P: 


s(i) (X) = [---2-.-x--.]. (*) 


The index s(7) describes a program computing @,,;)(x), but it does not appear in the 
program P. Due to the Parameter Theorem, the function s is computable. But then 
the Recursion Theorem tells us that there is a natural number n such that Qin) ~ Qn. 
So, let us take i :=n in the definition (*). Taking into account that Qin) ~ Pn, we 
obtain the following definition: 
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On(x) = [...n...x...]. (4%) 


Recall that m is a particular natural number that is dependent on the function s. 
Consequently, @, is a particular partial computable function. This function has been 
defined in (**) with its own index. 

We have proved that there are partial functions computable according to Turing 
that can be defined recursively. This finding is most welcome, because the Turing 
machine as the model of computation does not exhibit the capability of supporting 
self-reference and recursiveness as explicitly as the Gddel-Kleene model of compu- 
tation (i.e., the U-recursive functions). 


7.4.5 Practical Consequences: Recursive Program Execution 


Suppose that a partial computable function is defined recursively. Can the function 
be computed by a Turing machine? If so, what does the computation look like? 

Let us be given a function defined by g(x) = x!, where x € N. How can @(x) be 
defined and computed recursively by a Turing program 6? For starters, recall that 
(x) can be recursively (inductively) defined by the following system of equations: 


Q(x) =x-@(x-1), ifx >0 (*) 
(0) =1. 


For example, 9(4) = 4- 9(3) = 4-3- (2) =4-3-2-@(1) =4-3-2-1-@(0) = 
4-3-2-1-1=24. 


The definition («) uncovers the actions that should be taken by the Turing pro- 
gram 6 in order to compute the value @(x). Let 5(a) denote an instance of the 
execution of 6 on input a. We call (a) the activation of 6 on a; in this case, a is 
called the actual parameter of 6. To compute the value @(a), the activation 6(a) 
should do, in succession, the following: 


1. activate 6(a—1), i.e., call 6 on a—1 to compute @(a-— 1); 
2. multiply its actual parameter (= a) and the result r (= @(a— 1)) of 6(a— 1); 
3. return the product (= the result r of 5(a)) to the caller. 


In this fashion the Turing machine should execute every 6(a), a=x,x—1,...,2,1. 
The exception is the activation 6(0), whose execution is trivial and immediately 
returns the result r = 1. Returns to whom? To the activation 6(1), which activated 
6(0) and has, since then, been waiting for 6(0)’s result. This will enable 5(1) to 
resume execution at step 2 and then, in step 3, return its own result. Similarly, the 
following should hold in general: Every callee 6(a— 1) should return its result r (= 
(a—1)) to the awaiting caller 5(a), thus enabling 5(a) to continue its execution. 
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We now describe how, in principle, activations are represented in a Turing ma- 
chine, what actions the machine must take to start a new activation of its program, 
and how it collects the awaited result. For each activation 6(a),a = x,...,2,1, the 
following runtime actions must be taken: 


a. Before control is transferred to 5(a), the activation record of 5(a) is created. 


The activation record (AR) of (a) is a finite sequence of tape cells set aside 
to hold the information that will be used or computed by 6(a). In the case of 
(x) =xx!, the AR of 6(a) contains the actual parameter a and an empty field r 
for the result (= @(a)) of 6(a). We denote the AR of 6(a) by [a,r]. (See Fig. 7.4 
(a).) 


b. 5(a) activates 6(a—1) and waits for its result. 


To do this, 6(a) creates, next to its AR, anew AR [a—1,r] to be used by 6(a—1). 
Note that a— 1 is a value and r is empty. Then, d(a) moves the tape window to 
a—1 of [a—1,r] and changes the state of the control unit to the initial state q,. 
In this way, 6(a) stops its own execution and invokes a new instance of 6. Since 
the input to the invoked 6 will be a—1 from [a—1,r], the activation 6(a—1) will 
start. (See Fig. 7.4 (b).) 


c. 6(a—1) computes its result, stores it in its AR, and returns the control to 5(a). 


The details are as follows. After 6(a—1) has computed its result @(a—1), it 
writes it into r of [a—1,r]. Then, 6(a—1) changes the state of the control unit 
to some previously designated state, say g2, called the return state. In this way, 
6(a—1) informs its caller that it has terminated. (See Fig. 7.4 (c).) 


d. 8(a) resumes its execution. It reads 5(a—1)’s result from 5(a—1)’s AR, uses it 
in its own computation, and stores the result in its AR. 


Specifically, when the control unit has entered the return state, 6(a) moves the 
tape window to [a—1,r] of 6(a— 1), copies r (= @(a—1)) into its own [a,r], and 
deletes [a—1,r]. Then 5(a) continues executing the rest of its program (steps 2 
and 3) and finally writes the result (= @(a)) into r of its AR. (See Fig. 7.4 (d).) 


q, | 8(a-1) (a) 
> 
[far fat | | -, Eleo@|e1o@D] | 
activation record 
(a) (b) (c) @) 


Fig. 7.4 (a) An activation 6(a) and its activation record. (b) 6(a— 1) starts computing @(a— 1). 
(c) 6(a) reads 5(a—1)’s result. (d) 5(a) computes @(a) and stores it in its activation record 
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Example 7.1. (Computation of (x) = x! on TM) Figure 7.5 shows the steps in the computation 
of the value @(4) = 4! = 24. 


> 
[ar] Gisele Ge Bearer) [ee [3r[2r[erorT] | 
Input: 4 v 


<< << << 
24] (4r[3el] [#3rP22T] [ee arPardea) [@ [37 ]2r]er fou] ” 
A 


Result: (4) = 24 


v 
v 
v 


A 


Fig. 7.5 Recursive computation of the value @(4) = 4! on a Turing machine 


Clearly, 6 must take care of all runtime actions described in a, b, c, d. Conse- 
quently, 6 will consist of two sets of instructions: 


e the instructions implementing the actions explicitly dictated by the algorithm (*); 
e the instructions implementing the runtime actions a, b, c, d, implicit in (*). 


To prove that there exists a TM that operates as described above, we refer to the 
Computability Thesis and leave the construction of the corresponding program 6 as 
an exercise to the reader. 


Looking at Fig. 7.5, we notice that adding and deleting of ARs on the tape of a 
Turing machine follow the rules characteristic of a data structure called the stack. 
The first AR that is created on the tape is at the bottom of the stack, and the rightmost 
existing AR is at the top of the stack. A new AR can be created next to the top AR; 
we say that it is pushed onto the stack. An AR can be deleted from the stack only if 
it is at the top of the stack; we say that it is popped off the stack. We will use these 
mechanisms shortly. 

Later, we will often say that a TM T calls another TM T’ to do a task for T. 
The call-and-return mechanism described above can readily be used to make T’s 
program 6 call the program 6’ of T’, pass actual parameters to 6’, and collect 5’’s 
results. 
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7.4.6 Practical Consequences: Procedure Calls in 
General-Purpose Computers 


The mechanisms described in the previous section are used in general-purpose com- 
puters to handle procedure calls during program execution. 


The Concept of the Procedure 


In general-purpose computers we often use procedures to decompose large pro- 
grams into components. The idea is that procedures have other procedures do sub- 
tasks. A procedure (callee) executes its task when it is called (invoked) by another 
procedure (caller). The caller can also be the operating system. A callee may return 
one or several values to its caller. Each procedure has its private storage, where it 
can access its private (local) variables that are needed to perform its task. 
High-level procedural programming languages support procedures. They make 
linkage conventions in order to define and implement in standard ways the key 
mechanisms for procedure management. These mechanisms are needed to: 


e invoke a procedure and map its actual parameters to the callee’s private space, so 
that the callee can use them as its input; 

e return control to the caller after the callee’s termination, so that the caller can 
continue its execution immediately after the point of the call. 


Most languages allow a procedure to return one or more values to the caller. 


After a program (source code) is written, it must be compiled. One of the tasks of 
the compiler is to embed into the generated code all the runtime algorithms and data 
structures that are necessary to implement the call-and-return behavior implicitly 
dictated by the source code. 

The code is then linked into the executable code, which is ready to be loaded into 
the computer’s main memory for execution. 

When the program is started, its call-and-return behavior can be modeled with a 
stack. In the simplest case, the caller pushes the return address onto the stack and 
transfers the control to the callee; when the callee returns, it pops the return address 
off the stack and transfers the control back to the caller at the return address. In 
reality, besides the return address there is more information passed between the two 
procedures. 


The Role of the Compiler 


We now see what the compiler must do. First, it must embed in the generated code 
runtime algorithms that will correctly push onto and pop off the stack the informa- 
tion that must be transferred between a caller and callee via the stack. Second, it 
must establish a data structure that will contain this information. The structure is 
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called the activation record (AR). This must be implemented as a private block of 
memory associated with a specific invocation of a specific procedure. Consequently, 
the AR has a number of fields, each for one of the following data: 


e return address, where the caller’s execution will resume after the callee’s termi- 
nation; 

actual parameters, which are input data to the callee; 

register contents, which the caller preserves to resume its execution after return; 
return values, which the callee returns to the caller; 

local variables, which are declared and used by the callee; 

addressability information, which allows the callee to access non-local variables. 


The Role of the Operating System 


As described in Sect. 6.2.5, modern general-purpose computers and their operating 
systems allocate to each executable program, prior to its execution, a memory space 
that will be used by the program during its execution. Part of this memory is the 
runtime stack. During the execution of the program, an AR is pushed onto (popped 
off) the runtime stack each time a procedure is called (terminated). The current top 
AR contains the information associated with the currently executing procedure. In 
case the size of the runtime stack exceeds the limits set by the operating system, 
the operating system suspends the execution of the program and takes over. This 
can happen, for example, in a recursive program when a recursive procedure keeps 
invoking itself (because it does not meet its recursion-termination criterion). 


Execution of Recursive Programs 


So how would the program for computing the function @(x) = x! execute on a 
general-purpose computer? 

First, we write the program in a high-level procedural programming language, 
such as C or Java. In this we follow the recursive definition («) on p. 166 of the func- 
tion @(x). We obtain a source program similar to P below. Of course, P is recursive 
(i.e., self-referencing) because the activation P(x) starts a new activation P(x — 1), 
i.e., a new instance of P with a decreased actual parameter. The invocations of P 
come to an end when the recursion-termination criterion (x = 0) is satisfied. 


program P(x: integer) return integer; 
{ 
if x > 0 return x * P(x — 1) else 
if x = O return 1 
else Error(illegal_input) 


} 
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7.5 Chapter Summary 


A decidable set is also semi-decidable. If a set is decidable, so is its complement. If 
a set and its complement are semi-decidable, they are both decidable. A set is semi- 
decidable if and only if it is the domain of a computable function. If two sets are 
decidable, their union and intersection are decidable. If two sets are semi-decidable, 
their union and intersection are semi-decidable. 

The Padding Lemma states that a partial computable function has countably in- 
finitely many indexes; given one of them, countably many other indexes of the func- 
tion can be generated. The index set of a partial computable function contains all of 
its indexes, that is, all the encoded Turing programs that compute the function. 

The Parameter (s-m-n) Theorem states that the parameters of a function can be 
built into the function’s Turing program to obtain a new Turing program computing 
the function without parameters. The new Turing program can be algorithmically 
constructed from the old program and the parameters only. The generalization of 
the theorem is called the s-m-n Theorem. 

The Recursion Theorem states that if a transformation transforms every Turing 
program, then some Turing program is transformed into an equivalent Turing pro- 
gram. The Recursion Theorem allows a Turing-computable function to be defined 
recursively, i.e., with its own Turing program. The Turing machine model of compu- 
tation supports the recursive execution of Turing programs. This lays the theoretical 
grounds for the recursive execution of programs in general-purpose computers and, 
more generally, for the call-and-return mechanism that allows the use of procedures 
in programs. 


Problems 


7.1. Show that: 


(a) All tasks in the proof of the Parameter Theorem are computable; 
[Hint. See Box 7.1.] 


(b) The function s in the Parameter Theorem is injective. 


7.2. Prove the following consequences of the Parameter Theorem: 
(a) There is a computable function f such that mg(@,(e)) = dom(@e); 
ar Jx if x € dom(@); 


[Hint. Suppose that there existed a p.c. function defined by @(e,x) = ; 
+ otherwise. 


Viewing @(e,x) as a function of x, we would have rng(@(e,x)) = dom(@,). But p(e,x) 
does exist; it is computed by a TM that simulates universal Turing machine U on e,x 
and outputs x if U halts. ] 


(b) There is a computable function g such that dom(@,(¢)) = rng(@e); 
(c) Aset S is c.e. <=> S is the range of a p.c. function. 


Remark. Similar is true of the domain of a p.c. function (see Theorem 6.6, p. 151). 
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7.3. Prove the following consequences of the Recursion Theorem: 


(a) There is a TM that on any input outputs its own index and nothing else. 


(In other words: There is an n € N such that @, is the constant function with value n, 


ie., mg(@,) = {n}.) 


[Hint. Denote by k;, the constant function K.: N— {c}, where c€ N is an arbitrary number. 


Since kK, is a computable function, it is also a p.c. and hence kK, ~ @x, for some KEN. 
Clearly, k depends on c. Prove that there is a computable function f such that k = f(c). 
Then kK. ~ Pr(c)> for arbitrary c. Now, since f is computable, the Recursion Theorem 


guarantees that there exists ann for which @, ~ @f(n)- But pin) ~ Ky and hence @y ~ Ky. 


It follows that rng(@,) = {n}.] 


Remark. We can view such a TM as a Turing program that prints out its own code. 
Generally, any program that prints out its own copy is called a quine.? Quines exist in 
any programming language that has the ability to output any computable string. 


(b) There are two different Turing machines that output each other’s indexes and nothing else. 


(That is, there are m,n € N, m #1, such that @,, is the constant function with value n, 
and @, is the constant function with value m, i.e., mg(@n) = {n} Armg(@,) = {m}.) 


7.4. Prove the following consequences of the Recursion Theorem: 


(a) For every computable function /, there exists an n € N such that Wy = Wyn). 
def 


(Recall from Definition 6.4, p. 135, that W; = dom().) 
(b) There is ann € N such that W, = {n}. 
(c) There is ann € N such that W, = {n?}. 


(d) There is ann € N such that TMs 7, and T,,,; both compute the same function, 
that is, @y ~ @n+1. 


(e) For every p.c. function @(e,x), there is ann € N such that @(n,x) = @,(x). 


7.5. Theorem 7.10, p. 164, states that a computable function f has countably infinitely many fixed 
points. Recall that we proved the theorem in a non-constructive way; that is, we didn’t describe 
a generator that would—given one fixed point of f—list (i.e. enumerate) countably infinitely 


many other fixed points of f. 


(a) Can you describe an algorithm to generate an infinite set of fixed points of f? 


(Hint. Using f, construct a strictly increasing computable function s : N + N such that 
if iis a fixed point of f then s(i) is a fixed point of f, i-e., Q = OF) > Pi) = Pp(s(i))-] 


(b) Problem (a) has asked you to construct, given a computable function f, a Turing machine 
that will generate an infinite set {i,s(i),s(s(i)),...} of some of the fixed points of f. But 
can we generate the set of all fixed points of f? The answer is that this cannot be done 


for every f. Try to prove the following: 


There exists a computable function f whose set of fixed points is not a c.e. set. 


7.6. We say that a TM 7; calls another TM 7> to perform a subtask for T;. We view this as the 
program 6, calling the program (procedure) 6. Explain in detail how such a procedure call 
can be implemented on Turing machines. 


[Hint. Merge programs 6) and 6) into a program 6, which will use activation records 
to activate d) (see Sect. 7.4.5).] 


? In honour of Willard Van Orman Quine, 1908-2000, American philosopher and logician. 
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ye 
Chapter 8 Serato 
Incomputable Problems 


A problem is unsolvable if there is no single procedure that can 
construct a solution for an arbitrary instance of the problem. 


Abstract After the basic notions of computation had been formalized, a close link 
between computational problems—in particular, decision problems—and sets was 
discovered. This was important because the notion of the set was finally settled, and 
sets made it possible to apply diagonalization, a proof method already discovered 
by Cantor. Diagonalization, combined with self-reference, made it possible to dis- 
cover the first incomputable problem, i.e., a decision problem called the Halting 
Problem, for which there is no single algorithm capable of solving every instance of 
the problem. This was simultaneously and independently discovered by Church and 
Turing. After this, Computability Theory blossomed, so that in the second half of 
the 1930s one of the main questions became, “Which computational problems are 
computable and which ones are not?” Indeed, using various proof methods, many 
incomputable problems were discovered in different fields of science. This showed 
that incomputability is a constituent part of reality. 


8.1 Problem Solving 


In previous chapters we have discussed how the values of functions can be com- 
puted, how sets can be generated, and how sets can be recognized. All of these 
are elementary computational tasks in the sense that they are all closely connected 
with the computational model, i.e., the Turing machine. However, in practice we are 
also confronted with other kinds of problems that require certain computations to 
yield their solutions.! All such problems we call computational problems. Now the 
following question immediately arises: “Can we use the accumulated knowledge 
about how to solve the three elementary computational tasks to solve other kinds of 
computational problems?” In this section we will explain how this can be done. 


' Obviously, we are not interested in psychological, social, economic, political, philosophical, and 
other related problems whose solutions require reasoning beyond (explicit) computation. 
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8.1.1 Decision Problems and Other Kinds of Problems 


Before we start searching for the answer to the above question, we must define 
precisely what we mean by other “kinds” of computational problems. It is a well- 
known fact, based upon our everyday experience, that there is a myriad of different 
computational problems. By grouping all the “similar” problems into classes, we 
can try to put this jungle of computational problems in order (a rather simple one, 
actually). The “similarity” between problems can be defined in several ways. For 
example, we could define a class of all the problems asking for numerical solutions, 
and another class of all the other problems. However, we will proceed differently. 
We will define two problems to be similar if their solutions are “equally simple.” 
Now, “equally simple” is a rather fuzzy notion. Fortunately, we will not need to 
formalize it. Instead, it will suffice to define the notion informally. So let us define 
the following four kinds (i.e., classes) of computational problems: 


e Decision problems (also called yes/no problems). The solution of a decision 
problem is the answer YES or NO. The solution can be represented by a sin- 
gle bit (e.g., | = YES, 0 = NO). 

Examples: Is there a prime number in a given set of natural numbers? Is there a 
Hamiltonian cycle in a given graph? 


e Search problems. The solution of a search problem is an element of a given set 
S such that the element has a given property P. The solution is an element of a 
set. 

Examples: Find the largest prime number in a given set of natural numbers. Find 
the shortest Hamiltonian cycle in a given weighted graph. 


e Counting problems. The solution of a counting problem is the number of ele- 
ments of a given set S that have a given property P. The solution is a natural 
number. 

Examples: How many prime numbers are in a given set of natural numbers? How 
many Hamiltonian cycles are in a given graph? 


e Generation problems (also called enumeration problems). The solution of a 
generation problem is a list of elements of a given set S that have a given property 
P. The solution is a sequence of elements of a set. 
Examples: List all the prime numbers in a given set of natural numbers. List all 
the Hamiltonian cycles of a given graph. 


Which of these kinds of problems should we focus on? Here we make a prag- 
matic choice and focus on the decision problems, because these problems ask for the 
simplest possible solutions, i.e., solutions representable by a single bit. Our choice 
does not imply that other kinds of computational problems are not interesting—we 
only want to postpone their treatment until the decision problems are better under- 
stood. 
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8.1.2 Language of a Decision Problem 


In this subsection we will show that there is a close link between decision problems 
and sets. This will enable us to reduce questions about decision problems to ques- 
tions about sets. We will uncover the link in four steps. 


1. Let D be a decision problem. 


2. In practice we are usually confronted with a particular instance d of the problem 
D. The instance d can be obtained from D by replacing the variables in the defi- 
nition of D with actual data. Thus, the problem D can be viewed as the set of all 
the possible instances of this problem. We say that an instance d € D is positive 
or negative if the answer to d is YES or NO, respectively. 


Example 8.1. (Problem Instance) Let Dp,ijme = “Is n a prime number?” be a decision prob- 
lem. If we replace the variable n by a particular natural number, say 4, we obtain the instance 
d = “Is 4a prime number?” of Dp,ime. This instance is negative because its solution is the 
answer NO. In contrast, since 2009 we have known that the solution to the instanced = “Is 
243,112,609 _ 1 a prime number?” is YES, so the instance is positive. 


So: 
Let d be an instance of D. 


3. Any instance of a decision problem is either positive or negative (due to the Law 
of Excluded Middle; see Sect. 2.1.2). So the answer to the instance d is either 
YES or NO. But how can we compute the answer, say on the Turing machine? 
In the natural-language description of d there can be various data, e.g., num- 
bers, matrices, graphs. But in order to compute the answer on a machine—be it 
a modern computer or an abstract model such as the Turing machine—we must 
rewrite these data in a form that is understandable to the machine. Since any ma- 
chine uses its own alphabet & (e.g., © = {0,1}), we must choose a function that 
will transform every instance of D into a word over XY. Such a function is called 
the coding function and will be denoted by code. Therefore, code : D — X*, so 
code(D) is the set of codes of all instances of D. We will usually write (d) in- 
stead of the longer code(d). There are many different coding functions. From the 
point of view of Computability Theory, we can choose any of them as long as 
the function is computable on D and injective. These requirements” are natural 
because we want the coding process to halt and we do not want a word to encode 
different instances of D. 


Example 8.2. (Instance Code) The instances of the problem Dp,i¢ = “Is na prime number?” 
can be encoded by the function code : Dpyime —> {0, 1}* that rewrites n in its binary represen- 
tation; e.g., code(“Is 4 a prime number?”) = 100 and code(“Is 5 a prime number?”) = 101. 


So: 
Let code : D > L* be acoding function. 


2 In Computational Complexity Theory we also require that the coding function does not produce 
unnecessarily long codes. 
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4. After we have encoded d into (d), we could start searching for a Turing machine 
capable of computing the answer to d when given the input (d) € code(D). But 
we proceed differently: We gather the codes of all the positive instances of D in 
a set and denote it by L(D). This is a subset of L*, so it is a formal language (see 
Sect. 6.3.6). Here is the definition. 


Definition 8.1. (Language of a Decision Problem) The language of a decision 


def 


problem D is the set L(D) = {(d) € X* |d is a positive instance of D}. 


Example 8.3. (Language of a Decision Problem) The language of the decision problem 
Dprime is the set L(Dpyime) = {10,11,101,111,1011,1101,10001,10011,10111,11101,...}. 


5. Now the following is obvious: 
An instance d of D is positive <= > (d) € L(D). («) 


What did we gain by this? The equivalence (*) tells us that computing the answer 
to d can be substituted with deciding whether or not (d) is in L(D). (See Fig. 8.1.) 
Thus we have found a connection between decision problems and sets: 


Solving a decision problem D can be reduced to deciding membership of the set 
L(D) in x*. 


positive F 
instances coat 
Fig. 8.1 The answer to the es a] 
instance d of a decision / : 
problem D can be obtained by negative 
determining where the word instances 


(d) is relative to the set L(D) 


The connection (*) is important because it enables us to apply (when solving de- 
cision problems) all the theory developed to decide sets. Recall that deciding a set 
means determining what is in the set and what is in its complement (Sect. 6.3.3). 
Now let us see what the decidability of L(D) in L* tells us about the solvability of 
the decision problem D: 


e Let L(D) be decidable. Then there exists a decider Dip), which, for arbitrary 
(d) € Z*, answers the question (d) €? L(D). Because of (*), the answer tells 
us whether d is a positive or a negative instance of D. Consequently, there is 
an algorithm that, for the arbitrary instance d € D, decides whether d is posi- 
tive or negative. The algorithm is D;(p)(code(-)), the composition of the coding 
function and the decider of L(D). 
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e LetL(D) be semi-decidable. Then, for arbitrary (d) € L(D), the recognizer R;(p) 
answers the question (d) €? L(D) with YES. However, if (d) ¢ L(D), Ryp) may 
or may not return NO in finite time. So: There is an algorithm that for an arbitrary 
positive instance d € D finds that d is positive. The algorithm is R;(p)(code(-)). 


e Let L(D) be undecidable. Then there is no algorithm capable of answering, for 
arbitrary (d) € X*, the question (d) €? L(D). Because of (x), there is no algo- 
rithm capable of deciding, for an arbitrary instance d € D, whether d is positive 
or negative. 


So we can extend our terminology about sets (Definition 6.10) to decision problems. 


Definition 8.2. Let D be a decision problem. We say that the problem 


D is decidable (or computable) if L(D) is a decidable set; 
D is semi-decidable if L(D) is a semi-decidable set; 
D is undecidable (or incomputable) if L(D) is an undecidable set. 


NB Instead of a(n) (un)decidable problem we can say (in)computable problem. But 
bear in mind that the latter notion is more general: It can be used with all kinds 
of computational problems, not only with decision problems (Sects. 9.3 and 9.5). 
So, an (un)decidable problem is the same as an (in)computable decision problem. 
The term (un)solvable from this section’s motto is even more general: It addresses 
all kinds of computational and non-computational problems. (See Fig. 8.2.) 


unsolvable 


incomputable 


general problems~ 
Fig. 8.2 The relationship computational problems” _/ computable 
between different kinds of decision problems” salvabié 
problems 


8.1.3 Subproblems of a Decision Problem 


Often we encounter a decision problem that is a special version of another, more 
general decision problem. Is there any connection between the languages of the two 
problems? Is there any connection between the decidabilities of the two problems? 
Let us start with a definition. 
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Definition 8.3. (Subproblem) A decision problem Ds, is a subproblem of a 
decision problem Dp,op if Dsyp, is obtained from Dp,o, by imposing additional 
restrictions on (some of) the variables of Dp, p. 


A restriction can be put in various ways: A variable can be assigned a particular 
value; or it can be restricted to take only the values from a given set of values; or 
a relation between a variable and another variable can be imposed. For instance, if 
Dpyop has the variables x and y, then we might want to impose the equality x = y. 
(We will do this in the next section.) It should be clear that the following hold: 


Ds is a subproblem of Dp,»» => code(Ds,,,) C code(Ppyop) 
Dsy is a subproblem of Dp;ypp => L(Dsuy) © L(Pprop) 


Of course, we assume that we can decide whether or not d € Ds,», for any d € Dp;yop. 
It is easy to prove the next theorem. 


Theorem 8.1. Let Ds, be a subproblem of a decision problem Dpyop. Then: 
Dsyp is undecidable = > Dpyop is undecidable 


Proof. Let Ds,, be undecidable and suppose that Dp,o, is decidable. We could use the algorithm 
for solving Dp, op to solve (any instance of ) Ds,,, so Ds,, would be decidable. Contradiction. 


8.2 There Is an Incomputable Problem — Halting Problem 


We have just introduced the notions of decidable, undecidable, and semi-decidable 
decision problems. But, the attentive reader has probably noticed that at this point 
we actually do not know whether there exists any undecidable or any semi-decidable 
(yet not decidable) problem. If there is no such problem, then the above definition 
is in vain, so we should throw it away, together with our attempt to explore unde- 
cidable and semi-decidable (yet not decidable) problems. (In this case the definition 
of undecidable and semi-decidable (yet not decidable) sets would also be superflu- 
ous.) In other words, the recently developed part of Computability Theory should 
be abandoned. Therefore, the important question at this point is this: “Is there any 
decision problem that is undecidable or semi-decidable (yet not decidable)?” 

To prove the existence of such a problem, we should find a decision problem 
D such that the set L(D) is undecidable or semi-decidable (but not decidable). But 
how can we find such a D (if there is one at all)? In 1936, Turing succeeded in this.? 
How did he do that? 

Turing was well aware of the fact that difficulties in obtaining computational 
results are caused by those Turing machines that may not halt on some input data. 


3 Independently and at the same time, Church also found such a problem. 
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It would therefore be beneficial, he reckoned, if we could check, for any Turing 
machine T and any input word w, whether or not T halts on w. If such a check were 
possible, then, given an arbitrary pair (T,w), we would first check the pair (T,w) 
and then, depending on the outcome of the check, we would either start T on w, 
or try to improve T so that it would halt on w, too. This led Turing to define the 
following decision problem. 


Definition 8.4. (Halting Problem) The Halting Problem Dj,,;, is defined by 
Duatt = “Given a Turing machine T and a word w € 2*, does T halt on w?” 


Turing then proved the following theorem. 


Theorem 8.2. The Halting Problem Dyay is undecidable. 


Before we go into the proof, we introduce two sets that will play an important 
role in the proof and, indeed, in the rest of the book. The sets are called the universal 
and diagonal languages, respectively. 


Definition 8.5. (Universal Language Ky) The universal language, denoted* Ko, 
is the language of the Halting Problem, that is, 


def 


Ko = L(Duan) = {(T,w) | T halts on w} 


The second language, K, is obtained from Kp by imposing the restriction w = (T). 


Definition 8.6. (Diagonal Language K’) The diagonal language, denoted K, 
is defined by 
KE {(T,T) | T halts on (T)} 


Observe that K is the language L(Dy) of the decision problem 
Duy = “Given a Turing machine T, does T halt on (T)?” 
The problem Dy is a subproblem of Dyai;, since it is obtained from Dygy, by re- 


stricting the variable w to w = (T). 


We can now proceed to the proof of Theorem 8.2. 


4 The reasons for the notation Ko are historical. Namely, Post proved that this language is Turing- 
complete. We will explain the notion of Turing-completeness in Part III of the book; see Sect. 14.1. 
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Proof. (of the theorem) The plan is to prove (in a lemma) that K is an undecid- 
able set; this will then imply that K is also an undecidable set and, consequently, 
that Dyai is an undecidable problem. The lemma is instructive because it applies a 
cunningly defined Turing machine to its own code. 


Lemma 8.1. (Undecidability of ) The set K is undecidable. 


Proof. (of the lemma) Suppose that K is a decidable set. Then there must exist a 
decider Dx, which, for arbitrary T, answers the question (7,7) €? K with 


_ f YES, if T halts on (T); 
DA { NO, if T does not halt on (T). 

Now we construct a new Turing machine S. The intention is to construct S in 
such a way that, when given as input its own code (S), S will expose the incapability 
of Di to predict whether or not S will halt on (S). The machine S is depicted in 
Pig. 8,3. 


<T> 

S 
The shrewd Turing machine S - | 
uncovers the incapability of Dy ~“& 
to answer whether S halts 
on its own code <S>. "* The supposed machine D x 

answers, for arbitrary T, 
whether T halts 

YES on its own code <T>. 


Fig. 8.3 S exposes the incapability of the decider Dx to predict the halting of S on (S) 


The machine S operates as follows. The input to S is the code (T) of an arbitrary 
Turing machine 7. The machine S doubles (T) into (7,7), sends this to the decider 
Dx, and starts it. The decider Dx eventually halts on (7,7) and answers either YES 
or NO to the question (7,7) €?K. If Dx has answered YES, then S asks Dj again 
the same question. If, however, Dx has answered NO, then S outputs its own answer 
YES and halts. 

But S is shrewd: if given as input its own code (S), it puts the supposed decider 
Dx in insurmountable trouble. Let us see why. Given the input (S), S doubles it into 
(S,S) and hands it over to Dic, which in finite time answers the question (S,S) €?K 
with either YES or NO. Now let us analyze the consequences of each of the answers: 


a. Suppose that Dx has answered with Dx ((S,S)) = YES. Then S repeats the ques- 
tion (S,S) €?K to Dx, which in turn stubbornly repeats its answer Dc ((S,S)) 
= YES. It is obvious that S cycles and will never halt. But note that, at the same 
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time, Dx repeatedly predicts just the opposite, i-e., that S will halt on (S). We 
must conclude that in the case (a) the supposed decider Dx fails to compute the 
correct answer. 

b. Suppose that Dx has answered with Dx((S,S)) = NO. Then S returns to the 
environment its own answer and halts. But observe that just as before Dx has 
computed the answer Dx((S,S)) = NO and thus predicted that S will not halt 
on (S). We must conclude that in the case (b) the supposed decider Dx fails to 
compute the correct answer. 


Thus, the supposed decider Dx is unable to correctly decide the question (S,S) €?K. 
This contradicts our supposition that is a decidable set and Dx its decider. We 
must conclude that K is an undecidable set. The lemma is proved. 


Because K is undecidable, so is the problem Dy. Now Dy is a subproblem of 
the Halting Problem Dygy, and code(Dy) is a subset of code(Dyai). Then, by 
Theorem 8.1, the Halting Problem is undecidable too. This completes the proof of 
Theorem 8.2. 


As noted, the sets Ko and K are called the universal and diagonal languages, re- 
spectively. The reasons for such a naming are explained in the following subsection. 


8.2.1 Consequences: The Basic Kinds of Decision Problems 


Now we know that besides decidable problems there also exist undecidable prob- 
lems. What about semi-decidable problems? Do they exist? The answer is clearly 
yes, because every decidable set is by definition semi-decidable. So the real question 
is: Are there undecidable problems that are semi-decidable? That is, are there deci- 
sion problems such that only their positive instances are guaranteed to be solvable? 
The answer is yes. Let us see why. 


Theorem 8.3. The set Ko is semi-decidable. 


Proof. We must prove that there is a recognizer for Ko (see Sect. 6.3.3). Let us conceive the 
following machine: Given an arbitrary input (T,w) € X*, the machine must simulate T on w, and 
if the simulation halts, the machine must return YES and halt. So, if such a machine exists, it will 
answer YES iff (T,w) € Ko. But we already know that such a machine exists: It is the Universal 
Turing machine U (see Sect. 6.2.2). Hence, Ko is semi-decidable. 


Theorems 8.2 and 8.3 imply that Ko is an undecidable semi-decidable set. 


The proof of Theorem 8.3 has also revealed that Kp is the proper set of the Universal 
Turing machine. This is why Ko is called the universal language. 


What about the set Ko, the complement of Ko? 
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Theorem 8.4. The set Ko is not semi-decidable. 


Proof. If Ko were semi-decidable, then both Ky and Ky would be semi-decidable. Then Ky would 
be decidable because of Theorem 7.3 (p. 156). But this would contradict Theorem 8.2. So, the set 
Ko is not semi-decidable. 


Similarly, we prove that the undecidable set K is semi-decidable, and that K is 
not semi-decidable. 

We have proved the existence of undecidable semi-decidable sets (e.g., Ko and 
K) and the existence of undecidable sets that are not even semi-decidable (e.g., 
Ko and K). We now know that the class of all sets partitions into three nonempty 
subclasses, as depicted in Fig. 8.4. (Of course, we continue to talk about sets that 
are subsets of ©* or N, as explained in Sect. 6.3.5.) 


The class of all sets: 
Kio ©K 
K,* °K 


+} decidable 


Fig. 8.4 The three main classes of sets according to their decidability 


undecidable 


semi-decidable 


The Class of All Decision Problems 


We can view sets as languages of decision problems. Then, Ko and K (Fig. 8.4) 
are the languages of decision problems Dyq; and Dy, respectively. What about Ko 
and K? These are the languages of decision problems Dy and Dz, respectively, 
where Dy, = “Given a Turing machine T and a word w € 2*, does T never halt 
on w?” and Dy = “Given a Turing machine T, does T never halt on (T)?” Now we 
see (Fig. 8.5) that the class of all decision problems partitions into two nonempty 
subclasses: the class of decidable problems and the class of undecidable problems. 
There is also a third class, the class of semi-decidable problems, which contains all 
the decidable problems and some, but not all, of the undecidable problems. 
In other words, a decision problem D can be of one of three kinds: 


e Dis decidable. This means that there is an algorithm D that can solve an arbitrary 
instance d € D. Such an algorithm D is called a decider of the problem D. 

e Dis semi-decidable and undecidable. This means that no algorithm can solve an 
arbitrary instance d € D, but there is an algorithm R that can solve an arbitrary 
positive d€D. The algorithm R is called a recognizer of the problem D. 

e Dis not semi-decidable. Now, for any algorithm there exist a positive instance 
and a negative instance of D such that the algorithm cannot solve either of them. 
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The class of all decision problems: 


Dz * Dz 
a - H undecidable 
7 D 
semi-decidable ii = } 
decidable 


Fig. 8.5 The three main classes of decision problems according to their decidability 


8.2.2 Consequences: Complementary Sets and Decision Problems 


From the previous theorems it follows that there are only three possibilities for the 
decidability of a set S and its complement S: 


1. S and S are decidable (e.g., A, A in Fig. 8.6); 
2. Sand S are undecidable; one is semi-decidable and the other is not (e.g., O, @); 
3. S and S are undecidable and neither is semi-decidable (e.g., 0, MH). 


The class of all sets: 


e BO 
O 


undecidable 


semi-decidable 


AA } decidable 


Fig. 8.6 Decidability of complementary sets 


Formally, two complementary sets S and S are associated with two complemen- 


tary decision problems Ds and Ds, where Ds = Dg. Of course, for the decidability 
of these two problems there are also only three possibilities (Fig. 8.7). 


The class of all decision problems: 


e BO 
O 


undecidable 


semi-decidable 


AA fe decidable 


Fig. 8.7 Decidability of complementary decision problems 


Remark. Because of the situation @0o , two undecidable problems can differ in their undecidabi- 
lity. Does this mean that @ is more difficult than O? Are there different degrees of undecidability? 
The answer is yes. Indeed, there is much to be said about this phenomenon, but we postpone the 


treatment of it until Part III. 
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8.2.3 Consequences: There Is an Incomputable Function 


We are now able to easily prove that there exists an incomputable function. Actually, 
we foresaw this much earlier, in Sect. 5.3.3, by using the following simple argument. 
There are c different functions f : 2* —> £*, where c is the cardinality of the set of 
real numbers. Among these functions there can be at most No computable functions 
(as there are at most No different Turing machines, which compute functions’ val- 
ues). Since Ng < c, there must be functions f : Y* —> X* that are not computable. 
However, this proof is not constructive: It neither constructs a particular incom- 
putable function nor shows how such a function could be constructed, at least in 
principle. In the intuitionistic view, this proof is not acceptable (see Sect. 2.2.2). 
But now, when we know that Ko is undecidable, it is an easy matter to exhibit 
a particular incomputable function. This is the characteristic function ¥x, : 2* > 
{0,1}, where 
_ f 1(= Ys), ifx € Ko; 
Ky (*) = Ee NO), ifx ¢Ko. 


Corollary 8.1. (Incomputable Function) The characteristic function XK, is 
incomputable. 


Proof. If Xi, were computable, then Ko would be decidable, contradicting Theorem 8.2. 


By the same argument we also see that 7 is incomputable. 

Now the following question arises: Are there incomputable functions that are 
defined less generally than the characteristic functions of undecidable sets? The 
answer is yes. We will define such a function, the so-called Busy Beaver function, 
in the next section. 


8.3 Some Other Incomputable Problems 


We have seen that the formalization of the basic notions of computability led to 
the surprising discovery of a computational problem, the Halting Problem, an arbi- 
trary instance of which no single algorithm is capable of solving. Informally, this 
means that this problem is in general unsolvable; see Fig. 8.2 on p. 179. Of course, 
the following question was immediately raised: Are there any other incomputable 
problems? The answer is yes. Indeed, since the 1940s many other computational 
problems have been proved to be incomputable. The first of these problems were 
somewhat unnatural, in the sense that they referred to the properties and the opera- 
tions of models of computation. After 1944, however, more realistic incomputable 
problems were (and are still being) discovered in different fields of science and in 
other nonscientific fields. In this section we will list some of the known incom- 
putable problems, grouped by the fields in which they occur. (We will postpone the 
discussion of the methods for proving their incomputability to the next section.) 
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8.3.1 Problems About Turing Machines 


Halting of Turing Machines. We already know two incomputable problems about 
Turing machines: the Halting Problem Dygy and its subproblem Dy. But there are 
other incomputable problems about the halting of Turing machines. 


A HALTING OF TURING MACHINES 
Let T and T’ be arbitrary Turing machines and U the universal Turing machine. 
Let w € 2* be an arbitrary word and wo € 2* an arbitrary fixed word. Questions: 


Duyatt = “Does T halt on w?” 
Dy = “Does T halt on (T)?” 
“Does T halt on empty input?” 
“Does T halt on every input?” 
“Does T halt on wo?” 
“Does U halt on w?” 


“Do T and T' halt on the same inputs?” 


These problems are all undecidable; no algorithm can solve any of them in general. 


Properties of Turing Machine Languages. Recall from Sect. 6.3.3 that L(T), the 
proper set of a Turing machine T, contains exactly those words in ©* on which T 
halts in the accepting state. So, which Turing machines have empty and which ones 
nonempty proper sets? Next, by Definition 6.10 (p. 142), L(T’) is decidable if we can 
algorithmically decide, for an arbitrary word, whether or not the word is in L(T). 
Which Turing machines have decidable languages and which ones do not? 


A PROPERTIES OF TM LANGUAGES 
Let T be an arbitrary Turing machine. Question: 


Demp = “Is the language L(T ) empty?” 
“Is the language L(T ) decidable?” 


Both problems are undecidable; no algorithm can solve either of them for arbitrary T. 


Busy Beaver Problems. Informally, a busy beaver is the most productive Turing 
machine of its kind. Let us define the kind of Turing machines we are interested in. 
Let n > 1 be a natural number. Define 7,, to be the set of all two-way unbounded 
Turing machines T = (O,2,I',6,q1,-1,F), where OQ = {q1,...,dnsi}, © = {0,1}, 
T = {0,1,u},6:QxT > Q~x {1} x {Left, Right} and F = {qn41}. Informally, 
Tn contains all Turing machines that have 1) the tape unbounded in both directions; 
2) n > 1 non-final states (including the initial state g,) and one final state gn+i; 
3) tape symbols 0,1, and input symbols 0, 1; 4) instructions that write to a cell 
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only the symbol | and move the window either to the left or right. Such TMs have 
unbounded space on their tape for writing the symbol 1, and they do not waste time 
either writing symbols other than | or leaving the window as it is. 

It can be shown that there are finitely many Turing machines in 7,. (See Problem 
8.3, p. 204.) Now, let us pick an arbitrary T € 7, and start it on an empty input (i.e., 
with the tape containing 1 in each cell.) If T halts, we define its score o(T) to be 
the number of symbols | that remain on the tape after halting. If, however, T does 
not halt, we let o(T) be undefined. In this way we have defined a partial function 
o : 7, — N, where 


o(T)= 


wr J if T halts on empty input and leaves on its tape k symbols 1; 
+ otherwise. 


We will say that a Turing machine T € 7, is a stopper if it halts on an empty input. 
Intuitively, we expect that, in general, different stoppers in 7,—if they exist—attain 
different scores. Since there are only finitely many stoppers in 7;,, at least one of 
them must attain the highest possible score in 7,. But, what if there were no stoppers 
in 7,? Then, the highest score would not be defined for 7;, (i.e., for that ). Can this 
happen at all? The answer is no; in 1962 Rad6° proved that for every n > | there 
exists a stopper in 7,, that attains, among all the stoppers in 7,,, the maximum value 
of o. Such a stopper is called an n-state Busy Beaver and will be denoted by n-BB. 
Consequently, the function s : N > N, defined by 


s(n) = o(n-BB) 
is well defined (ie., s(n) | for every n € N). We call it the Busy Beaver function. 


At last we can raise the following question: Given an arbitrary Turing machine of 
the above kind (i.e., a machine in Uj; 7;), can we algorithmically decide whether 
or not the machine is an n-state Busy Beaver for some n? Interestingly, we cannot. 


A BUSY BEAVER PROBLEM 
Let T € Ujs) 7i be an arbitrary Turing machine. Question: 


Dpp = “Is T a Busy Beaver?” 


The problem is undecidable. There is no algorithm that can solve it for arbitrary 7. 


What about the Busy Beaver function s? Is it computable? The answer is no. 


A BUSY BEAVER FUNCTION 
The Busy Beaver function is incomputable. 


No algorithm can compute, for arbitrary n > 1, the score of the n-state Busy Beaver. 


5 Tibor Rad6, 1895-1965, Hungarian-American mathematician. 
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8.3.2 Post’s Correspondence Problem 


This problem was defined and proved to be undecidable by Post in 1946 as a result 
of his research into normal systems (see Sect. 6.3.2). It is important because it was 
one of the first realistic undecidable problems to be discovered and because it was 
used to prove the undecidability of several other decision problems. The problem 
can be defined as follows (Fig. 8.8). 
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Fig. 8.8 A positive instance of Post’s Correspondence Problem 


Let C be a finite set of elements, called card templates. Each card template is 
divided into an upper and lower half. Each half contains a word over some alphabet 
2. We call the two words the upper and the lower word of the card template. For 
each card template there is a stack with a potentially infinite number of exact copies 
of the template, called the cards. Cards from any stack can be put one after another 
in an arbitrary order to form a finite sequence of cards, called the C-sequence. Each 
C-sequence defines two compound words, called sentences. The upper sentence U 
is the concatenation of all the upper words in the C-sequence. Similarly, the lower 
sentence L is the concatenation of all the lower words in the C-sequence. Now the 
question is: Is there a C-sequence such that U = L? 


A POST’S CORRESPONDENCE PROBLEM 
Let C be a finite set of card templates, each with an unlimited number of copies. 
Question: 


Decp = “Is there a finite C-sequence such that U = L?” 


This problem is undecidable; no algorithm can solve it for arbitrary C. 


8.3.3 Problems About Algorithms and Computer Programs 


Termination of Algorithms and Programs. Algorithm designers and computer 
programmers are usually interested in whether their algorithms and programs al- 
ways terminate (i.e., do not get trapped into endless cycling). Since the Turing ma- 
chine formalizes the notion of algorithm, we can easily restate two of the above 
halting problems as the following undecidable problems. 
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A TERMINATION OF ALGORITHMS (PROGRAMS) 
Let A be an arbitrary algorithm and d be arbitrary input data. Questions: 


Dterm = “Does an algorithm A terminate on every input data?” 


“Does an algorithm A terminate on input data d?” 


Both problems are undecidable. Consequently, there is no computer program capa- 
ble of checking, for an arbitrary computer program P, whether or not P eventually 
terminates (even if the input data is fixed). This holds irrespectively of the program- 
ming language used to encode the algorithms. 


Correctness of Algorithms and Programs. Algorithm designers and computer 
programmers are most interested in their algorithms and programs correctly solving 
their problems of interest. It would therefore be highly advantageous to construct 
a computer program V that would verify, for an arbitrary problem P and an arbi- 
trary algorithm A, whether or not A correctly solves P. In order to be read by such 
a would-be verifier, the problem P and the algorithm A must be appropriately en- 
coded. So let code(P) denote the word encoding the problem P, and let code(A) 
denote the computer program describing the algorithm A. 


A CORRECTNESS OF ALGORITHMS (PROGRAMS) 
Let P be an arbitrary computational problem and A an arbitrary algorithm. Ques- 
tion: 


Deorr = “Does the algorithm code(A) correctly solve the problem code(P)?” 


The problem is undecidable; there is no algorithm (verifier) capable of solving it 
for arbitrary P and A. This is true irrespective of the programming language used 
to write the algorithms. Indeed, the problem remains undecidable even if code(A) is 
allowed to use only the most elementary data types (e.g., integers, character strings) 
and perform only the most elementary operations on them. This is bad news for 
those who work in the field of program verification. 


Shorter Equivalent Programs. Sometimes, programmers try to shorten their pro- 
grams in some way. They may want to reduce the number of program statements, 
or the total number of symbols, or the number of variables in their programs. In 
any case, they use some reasonable measure of program length. While striving for 
a shorter program they want to retain its functionality; that is, a shortened program 
should be functionally equivalent to the original one. Here, we define two programs 
as being equivalent if, for every input, they return equal results. So, the question is: 
Is there a shorter equivalent program? In general, this is an undecidable problem. 


A EXISTENCE OF SHORTER EQUIVALENT PROGRAMS 
Let code(A) be a program describing an algorithm A. Question: 
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“Given a program code(A), is there a shorter equivalent program?” 


This is an undecidable problem; no algorithm can solve it in general. This holds 
for any reasonable definition of program length and for any programming language 
used to code algorithms. 


8.3.4 Problems About Programming Languages and Grammars 


The syntax of a language—be it natural, programming, or purely formal—is de- 
scribed by means of a grammar. This can be viewed as a Post canonical system (see 
Box 6.2 on p. 138) whose productions have been restricted (i.e., simplified) in any 
of several possible ways. For example, natural languages, such as English, devel- 
oped complex syntaxes that demand so-called context-sensitive grammars to deal 
with. In contrast, programming languages, which are artificial, have their syntaxes 
defined by simpler grammars, the so-called context-free grammars (CFGs). This is 
because such grammars simplify the recognition (parsing) of computer programs 
(i.e., checking whether the programs are syntactically correct). This in turn allows 
us to construct simpler and faster parsers (and compilers). Finally, the basic build- 
ing blocks of computer programs, the so-called tokens, are words with even simpler 
syntax. Tokens can be described and recognized with so-called regular grammars. 
In summary, a grammar G can be used in several ways: to define a language, which 
we will denote by L(G); or to generate (some) elements of L(G); or to recognize the 
language L(G). For example, given a string w of symbols (be it a natural-language 
sentence or a computer program), we can answer the question w €?L(G) by trying 
to generate w using G. In particular, the parser tries to generate the given computer 
program using G. When G is a CFG, we say that L(G) is a context-free language 
(CFL); and when G is regular, L(G) is said to be a regular language. Many problems 
about grammars and their languages have been defined in the fields of programming 
language design and translation. Some of these are incomputable. 


Ambiguity of CFG Grammars. Programming language designers are only inter- 
ested in grammars that do not allow a computer program to be parsed in two different 
ways. If this happened, it would mean that the structure and meaning of the program 
could be viewed in two different ways. Which of the two was intended by the pro- 
grammer? Such a grammar would be ambiguous. So the detection of ambiguous 
grammars is an important practical problem. 


A AMBIGUITY OF CFG GRAMMARS 
Let G be a context-free grammar. Question: 


“Is there a word that can be generated by G in two different ways?” 


The problem is undecidable; there is no algorithm capable of solving it for arbitrary 
G. As a consequence, programming language designers must invent and apply, for 
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different Gs, different approaches to prove that G is unambiguous. To ease this, they 
additionally restrict feasible CFGs and deal with, for example, the so-called LL(k) 
grammars. Since CFGs are a subclass of the class of context-sensitive grammars, 
the above decision problem is a subproblem of the problem “Is there a word that 
can be generated by a context-sensitive grammar G in two different ways?” Hence, 
the latter is undecidable as well. 


Equivalence of CFG Grammars. Sometimes a programming language designer 
wants to improve a grammar at hand while retaining the old generated language. 
So, after improving a grammar G, to a grammar Gp, the designer asks whether or 
not L(G2) = L(G,). In other words, he or she asks whether or not G; and G2 are 
equivalent grammars. 


A EQUIVALENCE OF CFG GRAMMARS 
Let G; and Gz be CFGs. Question: 


“Do G, and G2 generate the same language?” 


This problem is undecidable; no algorithm can solve it for arbitrary G; and Go. 
Language designers must invent, for different pairs G,,G», different approaches to 
prove their equivalence. As above, the equivalence of context-sensitive grammars is 
an undecidable problem too. 


Some Other Properties of CFGs and CFLs. There are other practical problems 
about context-free grammars and languages that turned out to be undecidable. Some 
are listed below. We leave it to the reader to interpret each of them as a problem of 
programming language design. 


A OTHER PROPERTIES OF CFGs AND CFLS 
Let G and G’ be arbitrary CFGs, and let C and R be an arbitrary CFL and a 
regular language, respectively. As usual, is the alphabet. Questions: 


“Is L(G) = £*?” 

“Is L(G) regular?” 

“Is RC L(G)?” 

“Is L(G) =R?” 

“Is L(G) a CFL?” 

“Is L(G) CL(G')?” 

“Is L(G) L(G’) = 0?” 

(G) 


“Is L(G)NL(G’) a CFL?” 
“Is C an ambiguous CFL?” 
“Ts there a palindrome in L(G)?” 
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Each of these problems is undecidable. No algorithm can solve it in general. Of 
course, these problems are undecidable for context-sensitive G,G’, and C. 


Remark. As we saw, all the above undecidability results extend to context-sensitive grammars and 
languages. This was bad news for researchers in linguistics, such as Chomsky,° who did not expect 
a priori limitations on the mechanical, algorithmic processing of natural languages. 


8.3.5 Problems About Computable Functions 


For many properties about computable functions the following holds: The property 
depends neither on how the function’s values are computed (i.e., with what program 
or algorithm) nor on where the computation is performed (i.e., on what computer 
or model of computation). Such a property is intrinsic to the function itself, that is, 
it belongs to the function as a correspondence @:.A— B between two sets. For in- 
stance, totality is such a property, because whether or not @: A— B is total depends 
only on the definition of @, A, and 6, and it has nothing to do with the actual com- 
putation of @’s values. Deciding whether or not an arbitrary computable function 
has such an intrinsic property is usually an undecidable problem. 


A INTRINSIC PROPERTIES OF COMPUTABLE FUNCTIONS 
Let 9: AB and w: A— B be arbitrary computable functions. Questions: 


= “Is dom(@) empty?” 

Dein = “Is dom(@) finite?” 

Di = “Is dom(@) infinite?” 

Deo = “Is A—dom(@) finite?” 

Dor = “Is @ total?” 

Dey = “Can @ be extended to a total computable function?” 
Dsur = “Is @ surjective?” 

“Is @ defined at x?” 

“Is p(x) = y for at least one x?” 

“Is dom(@) = dom(y)?” 


“Ts ) ~ y? ” 
All of the above problems are undecidable; no algorithm can solve any of them. 


Remark. We can now understand the difficulties, described in Sect. 5.3.3, that would have arisen if 
we had used only total functions to formalize the basic notions of computation: The basic notions 
of computation would have been defined in terms of an undecidable mathematical notion (i.e., the 


property of being a total function). 


© Avram Noam Chomsky, b. 1928, American linguist and philosopher. 
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8.3.6 Problems from Number Theory 


Solvability of Diophantine Equations. Let p(x,,x2,...,x,) be a polynomial with 
unknowns x; and integer coefficients. A Diophantine equation is the equation 
P(x1,x2,---,Xn) = 0, for some n > 1, where only integer solutions x1,x2,...,Xn 
are sought. For example, the Diophantine equation 2x+ — 4xy* + 5yz+3z7 —6 =0 
asks for all the triples (x,y,z) € Z° that satisfy it. Depending on the polynomial 
P(X1,x2,---,X,), the Diophantine equation may or may not have a solution. For 
example, 6x + 8y — 24 = 0 and xz? + y*¢? — 2z+ = 0 both have a solution, while 
6x + 8y — 25 = 0 and x* + y? —3 = 0 do not. Since Diophantine equations find 
many applications in practice, it would be beneficial to devise an algorithm that 
would decide, for arbitrary p(x1,x2,...,X,), whether or not the Diophantine equa- 
tion p(x1,x2,...,Xn) = 0 has a solution. This is known as Hilbert’s tenth problem, 
the tenth in his list of 23 major problems that were unresolved in 1900. 


A SOLVABILITY OF DIOPHANTINE EQUATIONS 
Let p(x1,x2,...,X,) be an arbitrary polynomial with unknowns x1,x2,...,%, and 
rational integer coefficients. Question: 


“Does a Diophantine equation p(x,,X2,..-,Xn) = 0 have a solution?” 


This problem is undecidable. There is no algorithm capable of solving it for an 
arbitrary polynomial p. This was proved in 1970 by Matiyasevié,’ who built on the 
results of Davis, Putnam,? and Robinson. !9 


8.3.7 Problems from Algebra 


Mortal Matrix Problem. Let m,n > 1 and let M = {Mj,...,Mm} be aset of nxn 
matrices with integer entries. Choose an arbitrary finite sequence ij,i2,...,ig of 
numbers i;, where | <i; < m, and multiply the matrices in this order to obtain 
the product M;, x M;, x ... x M;,. Can it happen that the product is equal to O, the 
zero matrix of order n x n? The question is known as the Mortal Matrix Problem. 


A MORTAL MATRIX PROBLEM 
Let M be a finite set of n x n matrices with integer entries. Question: 


“Can the matrices of M be multiplied in some order, possibly with repetition, 


so that the product is the zero matrix O?” 


The problem is undecidable. No algorithm can solve it for arbitrary M. 


7 Juri Vladimirovié Matiyasevié, b. 1947, Russian mathematician and computer scientist. 

8 Martin David Davis, b. 1928, American mathematician. 

° Hilary Whitehall Putnam, 1926-2016, American philosopher, mathematician, computer scientist. 
10 Julia Hall Bowman Robinson, 1919-1985, American mathematician. 
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Word Problems. Let 2 be an arbitrary alphabet. A word on & is any finite sequence 
of the symbols of Y. As usual, ©* is the set of all the words on X, including the 
empty word €. If u,v € X*, then wv denotes the word that is the concatenation (i.e., 
juxtaposition) of the words u and v. A rule over X is an expression x — y, where 
x,y € L*. Let R be an arbitrary finite set of rules over ©. The pair (2,7) is called 
the semi-Thue system." In a semi-Thue system (Z,R) we can investigate whether 
and how a word of X* can be transformed, using only the rules of R, into another 
word of X*. Given two words u,v € &*, a transformation of u into v in the semi-Thue 
system (Z,R) is a finite sequence of words w1,...,w» € 2* such that 1) wu = w and 
Wp, = v and 2) for each i = 1,...n—1 there exists a rule in R, say x; — yj, such 
that w; = pxjs and w;,; = py;s, where the prefix p and suffix s are words in 2*, 
possibly empty. We write u —> v to assert that there exists a transformation of u into 
v in the semi-Thue system (£,R). Then the word problem for semi-Thue system 
(2,7) is the problem of determining, for arbitrary words u,v € £*, whether or not 
u —» v. We can impose additional requirements on semi-Thue systems and obtain 
Thue systems. Specifically, a Thue system is a semi-Thue system (2,7) in which 


x 3 yER <> yx R. Thus, ina Thue system we have u > vy —> v > u. 


A WORD PROBLEM FOR SEMI-GROUPS 
Let (2,R) be an arbitrary Thue system and u,v € L* be arbitrary words. 
Question: 


Can u be transformed into v in the Thue system (©,R)? 


This problem is undecidable. No algorithm can solve it for arbitrary u,v and Y, 7. 


Let (2,R) be a Thue system in which the following holds: For every a € there 
isab€¥ such that both € — ba and ba — € are rules of R. Informally, every symbol 
a has a symbol D that annihilates it. (It follows that every word has an annihilating 
“inverse” word.) For such Thue systems the following problem arises. 


A WORD PROBLEM FOR GROUPS 
Let (2,7) be an arbitrary Thue system where Va € Y be Lie > bac R A 
ba— € € R), and let u,v € L* be arbitrary words. Question: 


“Isu—> vin (Z,R)?” 


The problem is undecidable; no algorithm can solve it for arbitrary u,v,2,R. 


Remark. This is fine, but where are the semi-groups and groups? In a Thue system (2,72) the 
relation —-> is an equivalence relation, so E* is partitioned into equivalence classes. The set yas 
of all the classes, together with the operation of concatenation of classes, is a semi-group (Z/->,-). 
If, in addition, the rules 7 fulfill the above “annihilation” requirement, then (2 4, -) is a group. 


'l Axel Thue, 1863-1922, N orwegian mathematician. 
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Let E be a finite set of equations between words. Sometimes, a new equation 
can be deduced from € by a finite number of substitutions and concatenations and 


by using the transitivity of the relation “=”. Here is an example. Let € contain the 
equations 

1. be = cba 

2. ba = abc 

3. ca = ac 


We can deduce the equation abcc = cacacbaa. How? By concatenating the symbol 
a to both sides of equation 1 we get the equation bca = cbaa. Its left-hand side can 


be transformed by a series of substitutions as follows: bca 2 bac 2 abee. Similarly, 


eae : 2. ie 2. 1. 
we transform its right-hand side: cbaa = cabca = cacbaa = cacabca = cacacbaa. 
Since “=” is transitive, we finally obtain the equation abcc = cacacbaa. Can we 
algorithmically check whether or not an equation follows from a set of equations? 


A EQUALITY OF WORDS 
Let € be an arbitrary finite set of equations between words and let u,v be two 
arbitrary words. Question: 


“Does u = v follow from E?” 


The problem is undecidable. No algorithm can solve it for arbitrary u,v,€. 


8.3.8 Problems from Analysis 


Existence of Zeros of Functions. We say that a function f(x) is elementary if 
it can be constructed from a finite number of exponentials ef), logarithms log (-), 
roots VO, real constants, and the variable x by using function composition and 
the four basic operations +, —, x, and +. For example, fe en is an elementary 
function. If we allow these functions and the constants to be complex, then also 
trigonometric functions and their inverses become elementary. Now, often we are 
interested in zeros of functions. Given a function, before we attempt to compute its 
zeros, it is a good idea to check whether or not they exist. Can we do the checking 
algorithmically? 


A EXISTENCE OF ZEROS OF FUNCTIONS 
Let f : R > R be an arbitrary elementary function. Question: 


“Ts there a real solution to the equation f(x) =0?” 


The problem is undecidable. No algorithm can answer this question for arbitrary f. 
Consequently, the problem is undecidable for general functions f. 
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8.3.9 Problems from Topology 


Classification of Manifolds. Topological manifolds play an important role in the 
fields of mathematics and physics (e.g., topology, general relativity). Intuitively, 
the simplest topological manifolds are like curves and surfaces, but they can also 
be of higher dimension and impossible to be pictured in R?. Here the dimension 
is the number of independent numbers needed to specify an arbitrary point in the 
manifold. An n-dimensional topological manifold is called an n-manifold for short. 

The crucial property of n-manifolds is, roughly speaking, that they locally “look 
like” Euclidian space IR”. Let us make this more precise. First, two sets U C R* and 
V CR" are said to be homeomorphic if there is a bijection h : U — V such that both 
hand h~! are continuous. Such a function h is called a homeomorphism. Intuitively, 
if U and V are homeomorphic, then they can be continuously deformed one into 
the other without tearing or collapsing. Second, a set M C R* is said to be locally 
Euclidean of dimension n if every point of M has a neighborhood in M that is 
homeomorphic to a ball in R”. At last we can define: An n-manifold is a subset of 
some R* that is locally Euclidean of dimension n. 

The Euclidean space R? is a 3-manifold. But there are many more: 1-manifolds 
are curves (e.g., line, circle, parabola, other curves), while 2-manifolds are surfaces 
(e.g., plane, sphere, cylinder, ellipsoid, paraboloid, hyperboloid, torus). For n > 3, 
n-manifolds cannot be visualized (with the exception of R? or parts of it). 

There is one more important notion that we will need. A topological invariant is 
a property that is preserved by homeomorphisms. For example, the dimension of a 
manifold is a topological invariant, but there are others too. If two manifolds have 
different invariants, they are not homeomorphic. 

One of the most important problems of topology is to classify manifolds up 
to topological equivalence. This means that we want to produce, for each dimen- 
sion n, a list of n-manifolds, called n-representatives, such that every n-manifold is 
homeomorphic to exactly one n-representative. Ideally, we would also like to de- 
vise an algorithm that would compute, for any given n-manifold, the corresponding 
n-representative (or, rather, its location in the list). Unfortunately, the problem is 
incomputable for dimensions n > 4. Specifically, in 1958 Markov proved that for 
n > 4 the question of whether or not two arbitrary n-manifolds are homeomorphic 
is undecidable. 


A CLASSIFICATION OF MANIFOLDS 
Let n > 4 and let M ,,Mo be arbitrary topological n-manifolds. Question: 


“Are topological n-manifolds M, and M2 homeomorphic?” 


The problem is undecidable. There is no algorithm capable of distinguishing two 
arbitrary manifolds with four or more dimensions. 
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8.3.10 Problems from Mathematical Logic 


Decidability of First-Order Theories. In mathematical logic there were also 
many problems that turned out to be undecidable. After Hilbert’s program was 
set, one of the main problems of mathematical logic was to solve the Decidabil- 
ity Problem for M, where M denotes the theory belonging to the sought-for formal 
axiomatic system for all mathematics. Then, the Decidability Problem for M is de- 
noted by Dpeccmy and defined as Dpe.(m) = “Is F a theorem of M?”, where F can be 
an arbitrary formula of M. If this problem were decidable, then the corresponding 
decider Dentsch could today be programmed and run on modern computers. Such 
programs, called automatic theorem provers, would ease research in mathematics: 
One would only need to construct a relevant mathematical proposition, submit it 
to the automatic prover, and wait for the positive or negative answer. However, in 
1936, Hilbert’s challenge ended with an unexpected result that was simultaneously 
found by Church and Turing. In short, the problem Dp¢-(my is undecidable. What is 
more, this holds for any first-order formal axiomatic system F. 


A DECIDABILITY OF FIRST-ORDER THEORIES 
Let F be an arbitrary first-order theory and F an arbitrary formula of F. Question: 


Dpec(k) = “Is F a theorem of F?” 


This problem is undecidable; no algorithm can solve it for arbitrary F € F. Specifi- 
cally, there is no algorithm Degpiscy that can decide whether or not an arbitrary math- 
ematical formula is provable. 


Remark. So yet another goal of Hilbert’s program, the Entscheidungsproblem (goal D, Sect. 4.1.2) 
has received a negative answer. We will describe the proof on p. 217. Let us add that certain partic- 
ular theories can be decidable; for instance, in 1921, Post proved that the Propositional Calculus 
P is a decidable theory. He devised an algorithm that, for an arbitrary proposition of P, decides 
whether or not the proposition is provable in P. The algorithm uses truth-tables. An obvious prac- 
tical consequence of the undecidability of Dpe-r) is that designers of automatic theorem provers 
are aware of the hazard that their programs, no matter how improved, may not halt on some inputs. 


Satisfiability and Validity of First-Order Formulas. Recall that to interpret a 
theory one has to choose a particular set S and particular functions and relations 
defined on S, and define, for every formula F of F, how it is to be understood as a 
statement about the members, functions, and relations of S (see Sect. 3.1.3). So let 
us be given a theory and an interpretation 1 of it. If F has no free variable symbols, 
then its interpretation 1(F) is either a true or a false statement about the state of 
affairs in S. If, however, F contains free variable symbols, then, as soon as all the 
free variable symbols are assigned values, the formula 1(F) becomes either a true 
or a false statement about S. Obviously, the assignment of values to free variable 
symbols can affect the truth-value of 1(F). In general, there are many possible as- 
signments of values to free variable symbols. A formula F is said to be satisfiable 
under the interpretation 1 if 1(F) becomes true for at least one assignment of values 
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to its free variable symbols. And a formula F is said to be valid under the inter- 
pretation 1 if 1(F) becomes true for every assignment of values to its free variable 
symbols. Finally, if a formula is valid under every interpretation, it is said to be logi- 
cally valid. For instance, in a first-order theory the formula -V x P(x) => 4x4P(x) is 
logically valid. Satisfiability and validity are nice properties; it would be beneficial 
to be able to algorithmically recognize formulas with such properties. Can we do 
this? Unfortunately, we cannot. 


A SATISFIABILITY OF FIRST-ORDER FORMULAS 
Let F be a first-order theory, 1 an interpretation of F, and F an arbitrary formula 
of F. Question: 


Dsar¥.) = “Is F satisfiable under 1?” 


This problem is undecidable. There is no algorithm capable of solving it for an 
arbitrary F. However, if F = P, the Propositional Calculus, the problem is decidable. 
In that case, the problem is stated as follows: 


Dsar(p) = “Can variable symbols of a Boolean expression F be assigned 


truth-values in such a way that F attains the truth-value ‘true’?” 


This is the so-called Satisfiability Problem for Boolean Expressions, which plays an 
important role in Computational Complexity Theory. 


A VALIDITY OF FIRST-ORDER FORMULAS 
Let F be a first-order theory, 1 an interpretation of F, and F an arbitrary formula 
of F. Question: 


Dyalk.1) = “Is F valid under 1?” 


The problem is undecidable; no algorithm can solve it for an arbitrary F. 


8.3.11 Problems About Games 


Tiling Problems. Let 7 be a finite set of elements, called tile templates. Each tile 
template is a | x 1 square object with each of its four sides colored in one of finitely 
many colors (see Fig. 8.9). For each tile template there is a stack with a potentially 
infinite number of exact copies of the template, called the files. 

Next, let the plane be partitioned into | x 1 squares, that is, view the plane as the 
set Z*. Any finite subset of Z? can be viewed as a polygon with a border consisting 
only of horizontal and vertical sides—and vice versa. Such a polygon can be 7- 
tiled, which means that every 1 x 1 square of the polygon is covered with a tile from 
an arbitrary stack associated with 7. Without any loss of generality, we assume that 
the tiles cannot be rotated or reflected. 
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Now, a 7 -tiling of a polygon is said to be regular if every two neighboring tiles of 
the tiling match in the color of their common sides. The question is: Does the set T 
suffice to regularly 7 -tile an arbitrary polygon? Can we decide this algorithmically 
for an arbitrary 7? The answer is no. 


A DOMINO TILING PROBLEM 
Let 7 be a finite set of tile templates, each with an unlimited number of copies 
(tiles). Question: 


“Can every finite polygon be regularly T -tiled?” 


This problem is undecidable. No algorithm can solve it for an arbitrary set 7. 


-_. a polygon 
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tile templates 


a regular T-tiling of the polygon 


Fig. 8.9 A positive instance of the domino tiling problem 


A path p(A,B,X) is any finite sequence of 1 x 1 squares in Z? that connects the 
square A to the square B and does not cross the square X (see Fig. 8.10). 


A DOMINO SNAKE PROBLEM 
Let 7 be a finite set of tile templates and let A, B, and X be arbitrary 1 x 1 squares 
in Z. Question: 


“Ts there a path p(A,B,X) that can be regularly T -tiled?” 


The problem is undecidable; there is no algorithm capable of solving it for an arbi- 
trary 7, A, B, X. 
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a regular T-tiling of a path p(A,B,X) 


tile templates 


Fig. 8.10 A positive instance of the domino snake problem 


8.4 Can We Outwit Incomputable Problems? 201 


8.4 Can We Outwit Incomputable Problems? 


We said that for an incomputable (undecidable) problem P there exists no single 
algorithm A capable of solving an arbitrary instance p of the problem. However, 
in practice it may happen that we are only confronted with a particular instance pj; 
of P. In this case there exists an algorithm A; that can solve the particular instance 
p;. The algorithm A; must initially check whether or not the input data actually de- 
scribe the instance p; and, if so, start solving p;. If we succeed in constructing the 
algorithm A;, it will necessarily be designed specially for the particular instance p;. 
In general, A; will not be useful for solving any other instance of the problem P. 


In the following, we discuss two attempts to get around the harsh statement that 
A does not exist: 


1. It seems that we could construct the sought-for general algorithm A simply by 
combining all the particular algorithms A; for all the instances p; € P into one 
unit. That is, 


Algorithm A (p): 
begin 
if p = p; then A, else 
if p = p2 then A? else 


if p = p; then A; else 
end. 


However, there is a pitfall in this approach. In general, the particular algorithms 
A; differ one from another—we say that they are not uniform. This is because we 
lack a single method for constructing the particular algorithms. (If such a method 
existed, then the construction of A would go along with this method, and the prob- 
lem P would be computable.) Consequently, we must be sufficiently ingenious, 
for each instance p; € P separately, to discover and construct the corresponding 
particular algorithm Aj. 


But there are infinitely many instances of P (otherwise, P would be computable), 
so the construction of all of the particular algorithms would never end. Even if the 
construction somehow finished, the encoding of such A would not be finite, and 
this would violate the fundamental assumption that algorithms are representable 
by finite programs. (Which Turing program would correspond to A?) Such an A 
would not be an algorithm in the sense of the Computability Thesis. 
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2. The second attempt avoids the construction of infinitely many particular algo- 
rithms A;. To do this, we first construct a generator that is capable of generat- 
ing, in succession, all of the programs. Specifically, we can easily construct a 
generator Gy« that generates the programs in shortlex order (that is, in order of 
increasing length, and, in the case of equal length, in lexicographical order; see 
Sect. 6.3.5 and Appendix A, p. 369). Then we can use Gy« as an element in the 
construction of some other algorithm. In particular, it seems that we can construct 
a single, finite, and general algorithm B that is capable of solving the problem P: 


Algorithm B(p): 
begin 
repeat 
P :=call Gy» and generate the next program; 
until P can solve p; 
Start P on input p; 
end. 


Unfortunately, there are pitfalls in this attempt too. First, we have tacitly assumed 
that, for each instance p € P, there exists a particular program P that solves p. Is 
this assumption valid? Well, such a program might use a table containing the ex- 
pected input data (i.e., those that define the instance p), and the answer to p. Upon 
receiving the actual input the program would check whether or not these define 
the instance p and, if they do, it would return the associated answer. But now 
another question arises: Who will (pre)compute the answers to various instances 
and fill them in the tables? Secondly, who will decide (i.e., verify) whether or not 
the next generated program P can solve the instance p € ?? The tables would 
enable such a verification, but the assumption that such tables are known is unre- 
alistic. So, the verification should be done by a single algorithm. Such a verifier 
should be capable of deciding, for an arbitrary pair P and p € P, whether or 
not P correctly solves p. But the existence of such a verifier would imply that B 
correctly solves P, and hence P is computable. This would be a contradiction. 
Moreover, if the verifier could decide, for an arbitrary problem P, the question 
“Does P solve p € P?’, then also the incomputable problem CORRECTNESS OF 
ALGORITHMS (PROGRAMS) would be computable (see Sect. 8.3.3). Again this 
would be a contradiction. 


NB Should anyone construct an algorithm and claim that the algorithm can solve 
an incomputable problem, then we will be able—by just appealing to the Computa- 
bility Thesis—to rebut the claim right away. Indeed, without any analysis of the 
algorithm, we will be able to predict that the algorithm fails for at least one instance 
of the problem; when solving such an instance, the algorithm either never halts or 
returns a wrong result. Moreover, we will know that this holds for any present or 
future programming language (to code the algorithm), and for any present or future 
computer (to run the program). 
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8.5 Chapter Summary 


There are several kinds of computational problems: decision, search, counting, and 
generation problems. The solution of a decision problem is one of the answers YES 
or NO. An instance of a decision problem is positive or negative if the solution of 
the instance is YES or NO, respectively. 

There is a close link between decision problems and sets. Because of this we can 
reduce questions about decision problems to questions about sets. The language of 
a decision problem is the set of codes of all the positive instances of the problem. 
The question of whether an instance of a decision problem is positive reduces to the 
question of whether the code of the instance is in the corresponding language. 

A decision problem is decidable, undecidable, or semi-decidable if its language 
is decidable, undecidable, or semi-decidable, respectively. 

Given a Turing machine T and a word w, the Halting Problem asks whether or 
not T eventually halts on w. The Halting Problem is undecidable. There are un- 
decidable problems that are semi-decidable; such is the Halting Problem. There are 
also undecidable problems that are not even semi-decidable; such is the non-Halting 
Problem, which asks whether or not T never halts on w. 

An n-state stopper is a TM that has n > | non-final states and one final state, 
and a program that always writes the symbol | and moves the window and which, 
if started on an empty input, eventually halts and leaves a number of Is on its tape. 
An n-state Busy Beaver is an n-state stopper that outputs, from among all the n-state 
stoppers, the maximum number of Is. The problem of deciding whether or not a 
TM is an n-state Busy Beaver for any n is undecidable. The number of 1s output by 
an n-state Busy Beaver is a function of n that is incomputable. 

There are many other and more practical undecidable decision problems (and 
incomputable computational problems) in various fields of science. 


Problems 


8.1. Given a search problem (see Sect. 8.1.1), we can derive from it the corresponding decision 
problem. For example, from the search problem P(S) = “Find the largest prime in S.” we 
derive a decision problem D(S,n) = “Does S contain a prime larger than n?” 

Discuss the following questions: 


(a) Can we construct an algorithm for P(S), given a decider for D(S,n)? How? 


(b) Can a search problem be computable if its derived decision problem is undecidable? 


8.2. Similarly to search problems, counting and generation problems can also be associated 
in a natural way with the corresponding decision problems. 


(a) Give examples of counting and generation problems and the derived decision problems. 


(b) Can a counting or generation problem be computable if its decision counterpart is unde- 
cidable? 


204 8 Incomputable Problems 


8.3. Let n > 1. Define 7, to be the set of all the two-way unbounded TMs T = (Q,2,I',6,q1,.4,F), 
where O = {q1,..-,dn41}, © = {0,1}, F = {0,1,Lu}, 6: 0x > Ox {1} x {Left, Right} 
and F = {qn+1}. 


(a) Prove: 7, contains finitely many TMs, for any n > 1. 


(b) Compute |7;,. 
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ye 
Chapter 9 Spats 
Methods of Proving Incomputability 


A method is a particular way of doing something. 


Abstract How can we prove the undecidability of the problems listed in the previ- 
ous chapter? Are there any general methods of proving the undecidability of deci- 
sion problems? The answer is yes: Today we have at our disposal several such meth- 
ods. These are: 1) proving by diagonalization, 2) proving by reduction, 3) proving 
by the Recursion Theorem, and 4) proving by Rice’s Theorem. In addition, we can 
apply the results discovered by the relativization of the notion of computability. In 
this chapter we will describe the first four methods and postpone the discussion of 
relativized computability—which needs a separate, more extensive treatment—to 
the following chapters. 


9.1 Proving by Diagonalization 


We have already encountered diagonalization twice: first, when we proved the ex- 
istence of an intuitively computable numerical function that is not both p-recursive 
and total (Sect. 5.3.3), and second, when we proved the undecidability of the 
Halting Problem (Sect. 8.2). In these two cases diagonalization was used in two 
different ways, directly and indirectly. In this section we will give general descrip- 
tions of both. 


9.1.1 Direct Diagonalization 


This method was first used by Cantor in 1874 in his proof that 2‘, the power set of 
N, is uncountable. The generalized idea of the method is as follows. 

Let P be a property and S = {x|P(x)} be the class of all the elements with 
property P. Suppose that we are given a countable set J C S such that JT = 
{e,¢1,€2,...} and each e; can be uniquely represented by an ordered set, that is, 
€i = (Ci0,Ci,1,C7,2,---), Where ci,j are members of a set C (say, natural numbers). 
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We may ask whether the elements of S can all be exhibited simply by listing the 
elements of 7. In other words, we may ask 


“Is T = 8?” 


If we doubt this, we can embark on a proof that T ¢ S. To prove this, we can try 
the following method, called diagonalization. 

Imagine a table T (see Fig. 9.1) with the elements e9,e),e2,... on its vertical 
axis, the numbers 0,1,2,... on the horizontal axis, and the entries T (i, j) = c;,;. 


Fig. 9.1 By switching each component of the diagonal d = (coo, ¢1,1, 2,2, ...) we obtain sw(d), 
the switched diagonal. But sw(d) differs from every e;, because sw(cj,i) 4 ci; for every i € N 


The diagonal components c;; define an object d, called the diagonal: 
d = (00, C1,1, €2,2; -- +): 


Let sw: C + C bea function such that sw(c) #c for every c € C. We will call sw 
the switching function. Define the switched diagonal to be the object 


sw(d) = (sw(coo), sw(c1,1), sw(c2,2), ---)- 


Now observe that, for every i, the switched diagonal sw(d) differs from the element 
e; because sw(cii) 4 ci, (see Fig. 9.1). Consequently, sw(d) ¢ T = {e0,e1,e2,...}. 
This, however, does not yet prove T & S unless sw(d) € S. Therefore, if sw(d) has 
the property P, then sw(d) € S, and hence T ¢ S. 

In sum, if we can find a switching function sw such that the switched diagonal 
sw(d) has the property P, then J # S. The steps of the method are then as follows. 
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Method. (Proof by Diagonalization) Given a property P and a countable set 
T = {e0,e€1,€2,...} of elements with property P, then to prove that T ¢ S, 
where S is the class of all the elements with property P, proceed as follows: 


1. Uniquely represent each element e; of the set 7 by an ordered set, that is, 
ei = (Ci0,Ci,1,Ci2,---), where cj; € C and C is a set; 

2. Let T be a table with the elements e9,¢1,e2,... of J on its vertical axis, 
the numbers 0,1,2,... on its horizontal axis, and entries T (i, j) = ci,;; 


3. Try to find a function sw : C + C with the following properties: 
e sw(c) €c for every c EC; 
e The object (sw(coo), sw(c1,1), sw(c2,2), ...) has property P; 
4. If such a function sw is found, then 7 ¢ S. 


Example 9.1. (N vs. R) Let us prove that there are more real numbers than natural numbers. We 
focus on real numbers in the open interval (0,1) C R and show that the interval is uncountable. 
Define the property P(x) = “x is a real number in (0, 1).” SoS = (0,1). Let T = {e0, e1,e2,...} 
be an arbitrary countable set, where e; € S for all i, and each e; is uniquely represented as e; = 
0.cj,0Ci,1¢i2 -.., Where cj,; € C = {0,1,2,3,4,5,6,7,8,9}. The uniqueness of the representation is 
achieved if those e; that would have finitely many non-zero digits (e.g., 0.25) are represented by 
using an infinite sequence of 9s (e.g., 0.24999. ..). Clearly, 7 C S. Using diagonalization we then 
prove that 7 4 S. How? Imagine a table T (see Fig. 9.1) with the elements eo, e1,e2,... on its 
vertical axis, the numbers 0,1,2,... on the horizontal axis, and the entries T (i, j) = ci, j- Define the 
switching function by sw: c++ c+ 1(mod 10). The switched diagonal sw(d) represents a number 
Oil en ies ..., Where é = sw(c;;), which is in (0,1) = S but differs from every e; (as Gs # Cii)- 
Hence, sw(d)€ S—T and T #S. Since T was an arbitrary countable subset of S = (0, 1), it 
follows that (0, 1)—and therefore R—cannot be countable. 


Example 9.2. (Intuitively Computable but Not Total t-Recursive Function) Let us try to prove the 
existence of numerical functions that are intuitively computable but not total [1-recursive. 

Define the property P(f) = “f is an intuitively computable numerical function” and let S be 
the class of all such functions. Recall (Box 5.1, p.82) that every (total or non-total) U-recursive 
function is uniquely defined by its construction. Since a construction is a finite word over a finite 
alphabet, say {0,1}, it can be interpreted as a natural number. Thus, given arbitrary n € N, the nth 
construction (and the corresponding total or non-total -recursive function) is precisely defined. 
Suppose that we are given a set T = {fo, fi, f2,--.}, where fj is the ith total u-recursive function 
in the sequence of all (total and non-total) t-recursive functions. Each fj is perfectly defined by its 
values on N, so we can represent it by (cj9,Ci1,¢;,2,-.-), where cj; ; = fi(j7) € N=C and j 
Let T be a table with fo, fi, fo,... on its vertical axis, 0,1,2,... on the horizontal axis, and the 
entries T (i, j) = f;(j). Let the switching function be sw: c++ c+ 1. Clearly, sw(d) represents a 
total function. To compute its value at some n € N one locates n-th row of T and adds | to T(n,n). 
So sw(d) € S. But sw(d) differs from every f; as sw(d) and f; attain different values at i. Thus, 
sw(d) € S—T, and the function sw(d) is intuitively computable but not total [-recursive. 

Remark. We began with a proof that the set 7 of all total 1-recursive functions is enumerable, 
ie., T = {fo, fi, fo,...}. In the Platonic view, the set { fo, fi, fo,...} exists (and so does exist T), 
even if the constructions of the functions f; are not known to us. Thus, only the supposition that 7 
can be given as { fo, fi, f2,--. } enabled us to prove the existence! of sw(d) € S—T. 


' But to really compute the value of sw(d) at n € N, we would need actual contents of T (n,n), so 
we should know how f, is constructed. Can we algorithmically find the construction of f,, given n? 
Can we decide whether or not a given construction defines a total u-recursive function? 


208 9 Methods of Proving Incomputability 


9.1.2 Indirect Diagonalization 


First, recall that we encoded Turing machines T by encoding their programs by 
words (T) € {0,1}* and, consequently, by natural numbers that we called indexes 
(see Sect. 6.2.1). We then proved that every natural number is the index of exactly 
one Turing program. Hence, we can speak of the first, second, third, and so on 
Turing program, or, due to the Computability Thesis, of the first, second, third, and 
so on algorithm. So there is a sequence Ag,A ,A2,... such that every algorithm that 
we can conceive of is an A; for some i € N. Remember that A; is encoded (described) 
by its index i. 


J 
a) 1 Q ans i ea j seit <s> 


Fig. 9.2. Table T contains the results A;(j) of applying every algorithm A; on every input j. The 
diagonal values are A;(i), i= 0,1,2,.... The shrewd algorithm is S. If applied on its own code, S 
uncovers the inability of the alleged decider Dp to decide whether or not S has the property P 


Second, imagine an infinite table T (see Fig. 9.2) with Ap,A1,A2,... on the ver- 
tical axis, and 0,1,2,... on the horizontal axis. Define the components of T as fol- 
lows: For each pair i, j € N, let the component T (i, j) = A;(/), the result of algorithm 
A; when given the input j. 

Now let P be a property that is sensible for algorithms. Consider the question 


“Is there an algorithm Dp capable of deciding, for an arbitrary algorithm A, 
whether or not A has property P?” 


If it exists, the algorithm Dp is the decider of the class of algorithms that have 
property P. If we doubt that Dp exists, we can try to prove this with the following 
method, which is a generalization of the method used to prove Lemma 8.1 (p. 182). 
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Method. (Proof by Diagonalization) To prove that there is no algorithm Dp 
capable of deciding whether or not an arbitrary algorithm has a given property 
P, proceed as follows: 


1. Suppose that the decider Dp exists. 
2. Try to construct a shrewd algorithm S with the following properties: 
e S uses Dp; 


e if S is given as an input its own code (S), it uncovers the inability of Dp 
to decide whether or not S has the property P. 


3. If such an algorithm S is constructed, then Dp does not exist, and the 
property P is undecidable. 


The second step requires S' to expose (probably in some shrewd way) the inability 
of the alleged Dp to output the correct answer on the input (S). But how do we 
construct such an S? Initially, S must call Dp and hand over to it the input (S). By 
supposition, Dp will answer either YES (i.e., S has property P) or NO (i.e., S does not 
have property P). Then—and this is the hard part of the proof—we must construct 
the rest of S in such a way that the resulting S will have the property P if and only 
if Dp has decided the contrary. Jf we succeed in constructing such an S, then the 
alleged Dp errs on at least one input, namely (S), contradicting our supposition in 
the first step. Consequently, the decider Dp does not exist and the decision problem 
“Does A have the property P?” is undecidable. 

Of course, the existence of S and its construction both depend on property P. If 
S exists, it may take considerable ingenuity to construct it. 


Example 9.3. (Shrewd Algorithm for the Halting Problem) Is there a decider Dp capable of de- 
ciding, for an arbitrary TM, whether TM halts on its own code? 

We construct the shrewd TM S as follows. When S reads (S), it sends it to Dp to check whether 
S halts on (S). If Dp has answered YES (i.e., saying that S halts on (S)), we must make sure that the 
following execution of S will never halt. This is why we loop the execution back to Dp. Otherwise, 
if Dp has answered NO (i.e., indicated that S does not halt on (S)), we force S to do just the contrary: 
So S must halt immediately. The algorithm S is depicted in Fig. 8.3 on p. 182. 


Remark. We can apply this method even if we use some other equivalent model of computation. 
In that case, A; still denotes an algorithm, but the notion of the algorithm is now defined according 
to the chosen model of computation. For example, if the model of computation is the A-calculus, 
then A; is a A-term. 
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9.2 Proving by Reduction 


Until the undecidability of the first decision problem was proved, diagonalization 
was the only method capable of producing such a proof. Indeed, we have seen that 
the undecidability of the Halting Problem was proven by diagonalization. However, 
after the undecidability of the first problem was proven, one more method of proving 
undecidability became applicable. The method is called reduction. In this section we 
will first describe reduction in general, and then focus on two special kinds of this 
method, the so-called m-reduction and 1-reduction. 


9.2.1 Reductions in General 


Suppose we are given a problem P. Instead of solving P directly, we might try to 
solve it indirectly by executing the following scenario (Fig. 9.3): 


1. express P in terms of some other problem, say Q; 
2. solve Q; 
3. construct the solution to P by using the solution to Q only. 


If 1 and 3 have actually been implemented, we say that P has been reduced to Q. 


Problem Solution 
solve 
Wvigtdstetscctevar.dseteae a > 
Q 2. 
express 1. 3. | construct 

Fig. 9.3 Solving the problem 

P is reduced to (or substituted 

by) solving the problem Q 2 


How can we implement steps | and 3? To express P in terms of Q we need a 
function—call it -—that maps every instance p € P into an instance g € Q. Such an 
r must meet two basic conditions: 


a. Since problem instances are encoded by words in £* (see Sect. 8.1.2), r must be 
a function r : ©* — L* that assigns to every code (p) a code (q), where p € P 
and q € Q. 

b. Since we want to be able to compute (q) = r((p)) for arbitrary (p), the function 
r must be computable on L*. 


To be able to construct the solution to P from the solution to Q we add one more 
basic condition: 


c. The function 7 must be such that, for arbitrary p € P, the solution to p can be 
computed from the solution to g € Q, where (q) = r((p)). 
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If such a function r is found, it is called the reduction of problem P to problem 
Q. In this case we say that P is reducible to Q, and denote the fact by 


P<Q. 


The informal interpretation of the above relation is as follows. 
If P < Q, then the problem P is not harder to solve than the problem Q. 


The reduction r may still be too general to serve a particular, intended purpose. 
In this case we define a set C of additional conditions to be fulfilled by r. If such a 
function r is found, we call it the C-reduction of problem P to problem Q, and say 
that P is C-reducible to Q. We denote this by 


P <c Q. 


What sort of conditions should be included in C? The answer depends on the kind 
of problems 7, Q and on our intentions. For example, P and Q can be decision, 
search, counting, or generation problems. Our intention may be to employ reduc- 
tion in order to see whether some of the problems P and Q have a certain property. 
For instance, some properties of interest are (i) the decidability of a problem, (ii) 
the computability of a problem, (iii) the computability of a problem in polynomial 
time, and (iv) the approximability of a problem. The C-reductions that correspond to 
these properties are the m-reduction (<m), Turing reduction (<7), polynomial-time 
bounded Turing reduction (<7); and L-reduction (<,), respectively. 

In the following we will focus on the m-reduction (<,,) and postpone the dis- 
cussion of the Turing reduction (<7) and some of its stronger cases to Chaps. 11 
and 14. The resource-bounded reductions, such as oie <i and <;, are used in 
Computational Complexity Theory, so we will not discuss them. 


9.2.2 The m-Reduction 


If P and Q are decision problems, then they are represented by the languages 
L(P) C &* and L(Q) C x*, respectively. We use this in the next definition. 


Definition 9.1. (m-Reduction) Let P and Q be decision problems. A reduction 
r:2* — X* is said to be an m-reduction of P to Q if the following additional 
condition is met: 


C: (p) €L(P) = r((p)) € L(Q), for every p € P. 


In this case we say that P is m-reducible to QO and denote this by P <,, QO. 
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The condition C enforces that r transforms the codes of the positive instances 
Pp € P into the codes of the positive instances g € Q, and the codes of the negative 
instances p € P into the codes of the negative instances q € Q. This is depicted in 
Fig. 9.4. 


Fig. 9.4 m-reduction of a 
decision problem P to a 
decision problem Q. Instead 
of solving P directly, we 
show how to express an 
arbitrary instance p of P 
with an instance qg of QO, 
so that the answer to g will 
also be the answer to p. 
The mapping r: (p) + (q) 
must be computable, but not 
necessarily injective 


Obviously, m-reduction r maps the set L(P) into the set L(Q), i.e., r(L(P)) C L(Q). 
Here, we distinguish between two possible situations: 


1. r(L(P)) € L(Q). In this case r transforms P into a (proper) subproblem of Q. 
We say that problem P is properly contained in problem Q. 


2. r(L(P)) = L(Q). In this case r merely restates P into Q. We say that problem P 
is equal to problem Q. 


But Fig. 9.4 reveals even more: 


a) Suppose that L(Q) is a decidable set. Then also L(P) is a decidable set. 


Proof. Given an arbitrary (p)€*, we compute the answer to the question (p) €?L(P) as fol- 
lows: 1) compute r((p)) and 2) ask r((p)) €? L(Q). Since L(Q) is decidable, the answer to 
the question can be computed. If the answer is YES, then r((p)) € L(Q) and, because of the 
condition C in the definition of the m-reduction, also (p) € L(P). If, however, the answer is 
NO, then r((p)) ¢ L(Q) and hence (p) ¢ L(P). Thus, the answer to the question (p) €? L(P 
can be computed for an arbitrary (p) € 2*. 


Summary: Let P <,, Q. Then: L(Q) is decidable => L(P) is decidable. 
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b) Suppose that L(Q) is a semi-decidable set. Then L(P) is a semi-decidable set. 


Proof. Given an arbitrary (p) € 2*, ask (p) €? L(P). /fin truth (p) € L(P), then r((p)) € L(Q) 
by C. Since L(Q) is semi-decidable, the question r((p)) €? L(Q) is answered with YES. At the 
same time this is the answer to the question (p) €? L(P). If however, in truth (p) ¢ L(P), 
then r((p)) ¢ L(Q). Since L(Q) is semi-decidable, the question r((p)) €? L(Q) is either 
answered with NO or not answered at all. At the same time this is the outcome of the question 


(p) €? L(P). 
So: Let P <n Q. Then: L(Q) is semi-decidable => L(P) is semi-decidable. 


We have proved the following theorem. 


Theorem 9.1. Let P and Q be decision problems. Then: 


0) P<pQhr @ decidable problem => P decidable problem 
b) P<m@Q A Q semi-decidable problem => P semi-decidable problem 


Recall that a set that is not semi-decidable must be undecidable (see Fig. 8.4 on 
p. 184). Using this in the contraposition of the case b) of Theorem 9.1, substituting 
U for P, and assuming that U/ <,, Q, we obtain the following important corollary. 


Corollary 9.1. U/ undecidable problem \ U <p» Q => Q undecidable problem 


9.2.3 Undecidability and m-Reduction 


The above corollary is the backbone of the following method of proving that a deci- 
sion problem Q is undecidable. Informally, the method proves that a known unde- 
cidable problem / is a subproblem of, or equal to, the problem Q. 


Method. The undecidability of a decision problem Q can be proved as follows: 


1. Select: an undecidable problem U/; 
2. Prove: U <,, Q; 
3. Conclude: Q is an undecidable problem. 


Example 9.4. (Halting Problem Revisited) Using this method we proved the undecidability of 
the Halting Problem Dyan (see Sect. 8.2). To see this, select 4 = Dy and observe that U is a 
subproblem of Dyai. Note also that after we proved the undecidability of the problem Dy (by 
diagonalization), Dy was the only choice for U/. 
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Another method of proving the undecidability of a decision problem Q combines 
proof by contradiction and case a) of Theorem 9.1 with Y/ substituted for P. 


Method. The undecidability of a decision problem Q can be proved as follows: 


. Suppose: Q is a decidable problem; // Supposition. 
. Select: an undecidable problem U/; 

. Prove: U <m Q; 

. Conclude: U/ is a decidable problem; // 1 and 3 and Theorem 9. 1a. 
. Contradiction between 2 and 4! 

. Conclude: Q is an undecidable problem. 


Nn WN 


Let us comment on steps 2 and 3 (the other steps are trivial). 


e Step 2: For U we select an undecidable problem that seems to be the most 
promising one to accomplish step 3. This demands good knowledge of unde- 
cidable problems and of problem Q, and often a mix of inspiration, ingenuity, 
and luck. Namely, until step 3 is accomplished, it is not clear whether the task of 
m-reducing U to Q is impossible or possible (but we are not equal to the task of 
constructing this m-reduction). If we give up, we may want to try with another Z/. 


e Step 3: Here, the reasoning is as follows: Since Q is decidable (as supposed 
in step 1), there is a decider Dj(g) that solves Q (i.e., answers the question 
x €? L(Q) for an arbitrary x). We can use Djq) as a building block and try 
to construct a decider D; (yy); this would be capable of solving U/ (by answering 
the question w €? L() for an arbitrary w). [f we succeed in this construction, we 
will be able to conclude (in step 4) that // is decidable. 


We now see how to tackle step 3: Using the supposed decider D;(q), construct 
the decider D;,). A possible construction of D; (yj) is depicted in Fig. 9.5. 


then Dz U) solves the 
undecidable problem U. 


If the supposed D, oO 
solves the problem Q, 


YES NO 


Fig. 9.5 Suppose that Dz(q) solves the decision problem Q. Using D;,g), construct Dz), which 
solves an undecidable problem U/. Then D;;g) cannot exist and Q is undecidable 
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The machine D;,7,) operates as follows: It reads an arbitrary input w € 2*, trans- 
forms w into f(w) and hands this over to the decider Dg). The latter under- 
stands this as the question f(w) €? L(Q) and always answers with YES or NO. 
The answer D;,g)(f(w)) is then transformed by a function g and output to the 
environment as the answer of the machine D,;,) to the question w €?L(/). 


What do the functions f and g look like? Of course, we expect f and g to depend 
on the problems Q and U/. However, because we want the constructed D;@) to 
always halt, we must require the following: 


— f and g are computable functions; 


YES ifweL), 


for every w € 2*. 
NO ifw¢LW), ? 


— g(Dya)(f(w))) = 


The next (demanding) example highlights the main steps in our reasoning when we 
use this method. To keep it as clear as possible, we will omit certain details and refer 
the reader to the Problems later in this chapter to fill in the missing steps. 


Example 9.5. (Computable Proper Set) Let Q = “Is L(T) computable?” We suspect that Q is 
undecidable. Let us prove this by reduction. Suppose Q were decidable. Then there would exist a 
decider D;(g) that could tell, for any (7), whether or not L(T’) is computable. Now, what would be 
the undecidable problem U for which a contradictory TM could be constructed by using DQ)? 
This is the tricky part of the proof, and to answer the question, a great deal of inspiration is needed. 
Here it is. First, we show that there is a TM A which on input (T,w) constructs (the code of) a 
TM B such that L(B) = Ko (if T accepts w) and L(B) = @ (if T does not accept w). (Problem 9.5a.) 
Then, using A and the alleged Dj,(Q), we can construct a recognizer R for Ko. (Problem 9.5b.) 
But this is a contradiction! The existence of R would mean that Kg is c.e. (which it is not). 
So, our supposition is wrong and Q is undecidable. 


9.2.4 The \-Reduction 


Until now we did not discuss whether or not the reduction function r is injective. We 
do this now. If r is not injective, then several instances, say m > 1, of the problem P 
can transform into the same instance g € Q. If, however, r is injective, then different 
instances of P transform into different instances of Q. This is a special case of the 
previous situation, where m = 1. In this case we say that P is 1-reducible to Q and 
denote this by 

P <1 Q. 


Since it holds that 
P | a) => P <n Q, 


we can try to prove the undecidability of a problem by using 1-reduction. Notice 
that the general properties of m-reductions hold for 1-reductions too. 
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Example 9.6. (Halting Problem Revisited) Let us revisit the proof that the Halting Problem 
Duat(= Q) is undecidable (see Sect. 8.2). After we proved that Dy is undecidable (see Lemma 
8.1, p. 182), we took U/ = Dy. Then, the reasoning was: /f Dygy, were decidable, then the asso- 
ciated decider Dy Halt) could be used to construct a decider for the problem Dy. This decider is 


depicted in Fig. 9.6 a). The function f is total and computable since it only doubles the input (7). 
So is g, because it is the identity function. 


accepts X accepts 
nothing 


Fig. 9.6 Undecidability of problems: a) Dyqy; b) D, if D undecidable; c) Dempty 


Example 9.7. (Complementary Problem) If a problem D is undecidable, then the complementary 
problem D is undecidable. To prove this, take 7 =D, O = D and suppose that D were decidable. 
We could then use the supposed decider Dip): which is the same as Diy to construct the decider 
D,p). This is depicted in Fig. 9.6 b). Now f is the identity function and g transforms an arbitrary 
answer into the opposite one. 


Example 9.8. (Empty Proper Set) Recall from Sect. 6.3.3 that the proper set of a Turing ma- 
chine is the set of all the words accepted by the machine. So, given a Turing machine 7, does 
the proper set L(7) contain any words? This is the decision problem Dempry = “Is L(T) = 0?” 
Let us prove that it is undecidable. For the undecidable problem U/ we pick the Halting Problem 
Dualt = “Does T halt on w?” and suppose that Dgmpry(= Q) is decidable. Then we can use the 
associated decider D L(Dmpy) © Construct a decider for the language L(Dyai;)! The construction is 
in Fig. 9.6 c). When the constructed machine reads an arbitrary input (T,w), it uses the function f 
to construct the code (7") of anew machine 7’. This machine, if started, would operate as follows: 
Given an arbitrary input x, the machine T’ would simulate T on w and, if T halted, then 7’ would 
accept x; otherwise, T’ would not recognize x. Consequently, we have L(T') 4 0 <=> T halts on w. 
But the validity of L(T’) 4 @ can be decided by the supposed decider Di(Dempy) After swapping 
the answers (using the function g) we obtain the answer to the question “Does T halt on w?” 
So the constructed machine is capable of deciding the Halting Problem. Contradiction. 


Example 9.9. (Entscheidungsproblem) Recall that the Decidability Problem for a theory F is 
the question “Js theory F decidable?”. In other words, the problem asks whether there exists a 
decider that, for arbitrary formula F € F, finds out whether or not F is provable in F. Hilbert 
was particularly interested in the Decidability Problem for M, the sought-for theory that would 
formalize all mathematics, and, of course, in the associated decider Dgnysey. But Church and Turing 
independently proved the following theorem. 
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Theorem 9.2. The Entscheidungsproblem Dentscn is undecidable. 


Proof. (Reduction Dy <m PeEnitsch.) Suppose the problem Dg,;s¢, were decidable. Then there 
would exist a decider Dgnyscy capable of deciding, for an arbitrary formula of the Formal Arithmetic 
A, whether or not the formula is provable in A (see Sects. 4.2.1, 4.2.2). Now consider the statement 


“Turing machine T halts on input (T).” (*) 


The breakthrough was made by Turing when he showed that this statement can be represented by a 
formula in A. In the construction of the formula, he used the concept of the internal configuration 
of the Turing machine T (see Definition 6.2 on p. 114). Let us designate this formula by 


K(T). 


Then, using the supposed decider Dgy;5-, we could easily construct a decider D capable of decid- 
ing, for arbitrary T, whether or not T halts on (7). The decider D is depicted in Fig. 9.7. Here, the 
function f maps the code (7) into the formula K(T), while the function g is the identity function 
and maps the unchanged answer of Desc; into the answer of D. 


If the supposed D,, "then the algorithm D solves the undecidable 


NeESCH ~ 

solves the problem : roblem Dy = "Does T halt on (T) ?"' 
- . Dentsch # sd ) 

"Is K (T) provable in AP" 


YES NO 


Fig. 9.7 Solving the Halting Problem with the alleged decider for the Entscheidungsproblem 


Suppose that the Formal Arithmetic A were as Hilbert expected, that is, consistent and complete. 
Then, D’s answer would tell us, for arbitrary T, whether or not the statement (*) is true. But this 
would in turn mean that the halting problem Dy is decidable—which it is provably not! We must 
conclude that the problem Dgprscz, is undecidable. (A fortiori, thanks to Gédel we know that For- 
mal Arithmetic A is not complete.) This completes the proof. 


Thus, the decider Dgpiscy,, does not exist. What about semi-decidability? Is it perhaps that 
Dentsch 18 a semi-decidable problem? The answer is yes. 


Theorem 9.3. The Entscheidungsproblem Dgntsch is semi-decidable. 


Proof. Let F € M be an arbitrary formula. We have seen in Sect. 4.2.2 that we can systematically 
generate all the sequences of symbols of M and, for each generated sequence, algorithmically 
check whether or not it is a proof of F in M. If the formula F is in truth a theorem of M, then a 
proof of F in M exists and will, therefore, be eventually generated and recognized. So the described 
algorithm is a recognizer of the set of all mathematical theorems. If, however, F is in truth not 
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a theorem of M, then the recognizer will never halt. Nevertheless, this suffices for the problem 
Dentsch to be semi-decidable. This completes the proof. 


Remark. After 16 years, yet another goal of Hilbert’s Program received a negative response: 
Formal Arithmetic A is not a decidable theory. Thus, the recognition of arithmetical theorems 
cannot be fully automated, regardless of the kind of algorithms we might use to achieve this goal. 
Since A is a subtheory of M, the same holds for mathematics: No matter what kind of algorithms 
we use, only a proper subclass of all mathematical theorems can be algorithmically recognized. So 
there will always be mathematical theorems whose theoremship is algorithmically undecidable. 
After the researchers learned this, some focused on formal axiomatic systems and theories that 
are weaker than Formal Arithmetic yet algorithmically decidable. An example of such a theory is 
Presburger Arithmetic,’ introduced in 1929. In this arithmetic the operation of multiplication is 
omitted, thus allowing on N only the operation of addition and the equality relation. 


9.3 Proving by the Recursion Theorem 


Recall from Sect. 7.4 that the Recursion Theorem can be restated as the Fixed-Point 
Theorem, which tells us that every computable function has a fixed point. This re- 
veals the following method for proving the incomputability of functions. 


Method. (Incomputability of Functions) Suppose that a function g has no 
fixed point. Then, g is not computable (i.e., it is not total, or it is incomputable, 
or both). If we somehow prove that g is total, then g must be incomputable. 


We further develop this method into a method for proving the undecidability of 
problems as follows. Let D be a decision problem that we want to prove is un- 
decidable. Suppose that D is decidable. Then the characteristic function ¥;,p) is 
computable. We can use 7; (p) as a (necessary) component and construct a function 
g:N-—N in such a way that g is computable. We then try to prove that g has no 
fixed point. If we succeed in this, we have a contradiction with the Fixed-Point The- 
orem. Thus, g cannot be computable and, consequently, 77) cannot be computable 
either. So D is undecidable. We summarize this in the following method. 


Method. (Undecidability of Problems) Undecidability of a decision problem 
D can be proved as follows: 


Suppose: D is a decidable problem; 

Construct: a computable function g using the characteristic function Y7(p); 
Prove: g has no fixed point; 

Contradiction with the Fixed-Point Theorem! 

Conclude: D is an undecidable problem. 


SAS 


? Mojzesz Presburger, 1904-1943, Polish mathematician, logician, and philosopher. 
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Example 9.10. We will use this method to prove Rice’s Theorem in the next subsection. 


Example 9.11. See the proof of the incomputability of a search problem in Sect. 9.5. 


9.4 Proving by Rice’s Theorem 


All the methods of proving the undecidability of problems that we have described 
until now may require a considerable degree of inspiration, ingenuity, and even luck. 
In contrast, the method that we will consider in this section is far less demanding. 
It is based on a theorem that was discovered and proved in 1951 by Rice. There are 
three versions of the theorem: for partial computable functions, for index sets, and 
for computably enumerable sets. They state: 


for p.c. functions: Every non-trivial property of p.c. functions is undecidable. 
for index sets: Every index set different from 0 and N is undecidable. 
for c.e. sets: Every non-trivial property of c.e. sets is undecidable. 


The theorem reveals that the undecidability of certain kinds of decision problems 
is more of a rule than an exception. Let us see the details. 


9.4.1 Rice’s Theorem for P.C. Functions 


Let P be a property that is sensible for functions, and @ an arbitrary partial com- 
putable function. We define the following decision problem: 


Dp = “Does p.c. function @ have the property P?” 


We will say that the property P is decidable if the problem Dp is decidable. Thus, 
if P is a decidable property of p.c. functions, then there exists a Turing machine that 
decides, for an arbitrary p.c. function @, whether or not @ has the property P. 

What are the properties P that we will be interested in? We will not be interested 
in a property if it is such that a function has it in some situations and does not have 
it in other situations. This could happen when the property depends on the way in 
which function values are computed. For example, whether or not a function @ has 
the property defined by P(@) = “@(n) is computed in n? steps” certainly depends on 
the algorithm used to compute @(n). It may also depend on the program (i.e., algo- 
rithm encoding) as well as on the machine where the function values are computed. 
(The machine can be an actual computer or an abstract computing machine.) For 
this reason, we will only be interested in properties that are intrinsic to functions, 
i.e., properties of functions where the functions are viewed only as mappings from 
one set to another. Such properties are because of their basic nature insensitive to 


3 Henry Gordon Rice, 1920-2003, American logician and mathematician. 
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the machine, algorithm, and program that are used to compute function values. For 
instance, being total is an intrinsic property of functions, because every function @ 
is either total or not total, irrespective of the way the @ values are computed. 

So let P be an arbitrary intrinsic property of p.c. functions. It may happen that 
every p.c. function has this property P; and it also may happen that no p.c. function 
has the property P. In each of these two cases we will say that the property P is trivial. 

This is where Rice’s Theorem enters. Rice discovered the following surprising 
connection between the decidability and the triviality of the intrinsic properties of 
partial computable functions. 


Theorem 9.4. (Rice’s Theorem for p.c. Functions) Let P be an arbitrary intrinsic 
property of p.c. functions. Then: 


P is decidable <> P is trivial. 


Proof. See below. 


9.4.2 Rice’s Theorem for Index Sets 


Before we embark on the proof of Theorem 9.4, we restate the theorem in an equiv- 
alent form that will be more convenient to prove. The new form will refer to index 
sets instead of p.c. functions. 

So let P be an arbitrary intrinsic property of p.c. functions. Define ¥p to be the 
class of all p.c. functions having the property P, 


Fp ={w| wis ap.c. function with property P}. 
Then the decision problem Pp on the previous page can be written as 
Dp= Q €? Fp. 


Let ind(.#p) be the set of indexes of all the Turing machines that compute any of 
the functions in ¥p. Clearly ind(.7p) = Uyez, ind(y), where ind(y) is the index 
set of the function y (see Sect. 7.2). Now the critical observation is: Because P 
is insensitive to the program used to compute @’s values, it must be that ind(.¥p) 
contains either all of the indexes of the Turing machines computing @, or none 
of these indexes—depending on whether or not @ has property P. So the question 
@ €? Fp is equivalent to the question x €? ind(.¥p), where x is the index of an 
arbitrary Turing machine computing @. Hence, 


Dp=xe€? ind(¥p). 


From the last formulation of Dp it follows that 
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Dp is a decidable problem <=> ind(.¥p) is a decidable set. 


But when is the set ind(.¥p) decidable? The answer gives the following version 
of Rice’s Theorem. 


Theorem 9.5. (Rice’s Theorem for Index Sets) Let ¥p be an arbitrary set of p.c. 
functions. Then: 


ind(.¥p) is decidable ==> ind(.F¥p) is either 0 or N. 


This form of Rice’s Theorem can now easily be proved by the Fixed-Point Theorem 
(see Box 9.1). 


Box 9.1 (Proof of Theorem 9.5). 


Both @ and N are decidable sets. Consequently, ind(.¥p) is decidable if it is either of the two. 
Now, take an arbitrary ind(.¥p) that is neither 0 nor N. Suppose that ind(.¥p) is decidable. Then 
we can deduce a contradiction as follows. Because ind(.¥p) is a proper and nonempty subset of 
N, there must be two different natural numbers a and b such that a € ind(.¥p) and b € ind(.¥p). 
Define a function f : N > N that maps every x € ind(.¥p) into b and every x € ind(.¥p) into a. 
Clearly, f is total. But it is also computable. Why? By supposition ind(.¥p) is decidable, so in 
order to compute f(x) for an arbitrary x € N, we first decide whether x € ind(.¥p) or x € ind(.Fp 
and, depending on the answer, assign f(x) = b or f(x) =a, respectively. 
So, by the Fixed-Point Theorem, f should have a fixed point. But it is obvious that there can be 
no fixed point: Namely, for an arbitrary x € N, the numbers x and f(x) are in different sets ind(.¥p 


and ind(.¥p), so Wy # Wy(x)- This is a contradiction. Hence, ind(.¥p) cannot be decidable. 


There is some bad news brought by Rice’s Theorem. We have just proved that the 
question Dp = “Does @ have the property P?” is only decidable for trivial proper- 
ties P. But trivial properties are not very interesting. A property is more interesting 
if it is non-trivial, i.e., if it is shared by some functions and not by others. Non- 
trivial properties are less predictable, so getting the answer to the above question is 
usually more “dramatic,” particularly when the consequences of different answers 
are very different. In this respect Rice’s Theorem brings to us disillusion because it 
tells us that any attempt to algorithmically fully recognize an interesting property of 
functions is in vain. 

But there is good news too. Rice’s Theorem tells us that determining whether the 
problem Pp is decidable or not can be reduced to determining whether or not the 
property P of functions is trivial. But the latter is usually easy to do. So we can set 
up the following method of proving the undecidability of decision problems of the 
kind Dp. In this way we can prove the undecidability of the decision problems listed 
in Section 8.3.5. 
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Method. (Undecidability of Problems) Given a property P, the undecidability 
of the decision problem Dp = “Does a p.c. function @ have the property P?” 
can be proved as follows: 


1. Try to show that P meets the following conditions: 

a. P is aproperty sensible for functions; 

b. P is insensitive to the machine, algorithm, or program used to compute @. 
2. If P fulfills the above conditions, then try to show that P is non-trivial. 

To do this, 

a. find a p.c. function that has property P; 

b. find a p.c. function that does not have property P. 


If all the steps are successful, then the problem Pp is undecidable. 


9.4.3 Rice’s Theorem for C.E. Sets 


Let R be a property that is sensible for sets and let S be an arbitrary c.e. set. We 
define the following decision problem: 


Dr = “Does ac.e. set S have the property R?” 


As with functions, we say that the property R of c.e. sets is decidable if the prob- 
lem Dr is decidable. So, if R is decidable, then there is a Turing machine capable 
of deciding, for an arbitrary c.e. set S, whether or not R holds for S. For the same 
reasons as above, we are only interested in the intrinsic properties of sets. These are 
the properties R that are independent of the way of recognizing the set S. Hence, the 
answer to the question of whether or not S has the property R is insensitive to the 
local behavior of the machine, algorithm, and program that are used to recognize S. 
For example, being finite is an intrinsic property of sets, because every set S either 
is or is not finite, irrespective of the way in which S is recognized. Finally, we say 
that R is trivial if it holds for all c.e. sets or for none. 


Theorem 9.6. (Rice’s Theorem for c.e. Sets) Let R be an arbitrary intrinsic 
property of c.e. sets. Then: R is decidable <=> R is trivial. 


Box 9.2 (Proof of Theorem 9.6). 


Let S be an arbitrary c.e. set and define .%p = {A’| ¥X is ac.e. set with property R}. Then we can 
restate the above decision problem to 


Des €). Pp: (1) 
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Since a c.e. set is the domain of a p.c. function (see Theorem 6.6 on p. 151), we have S = dom(@,) 
for some x € N, hence -%e = {dom(@;) | ; is a p.c. function A R(dom(q;))}. Now let Fp be the 
set of all p.c. functions with the property P that their domains have the property R. Note that since 
R is intrinsic to c.e. sets, P is intrinsic to p.c. functions. Then (1) can be restated to 


Dr = Oy €? Fp. (2) 


We now proceed as in Sect. 9.4.2. First, we introduce the set ind(.¥p) of the indexes of all of the 
p.c. functions that are in .¥p. This allows us to restate (2) to 


Dr=xe? ind(.¥p). (3) 


So, Dr is decidable iff x €? ind(.¥p) is decidable. But the latter is (by Theorem 9.5) decidable 
iff ind(.¥p) is either 0 or N. Hence, Dp is decidable iff either none or all of the domains of p.c. 
functions have the property R; that is, iff either none or all of the c.e.sets have the property R. 
In other words, Dr is decidable iff R is trivial. 


Therefore, determining whether the problem Dp is decidable or not can be re- 
duced to determining whether or not R is a trivial property of c.e. sets. Again the 
latter is much easier to do. 

Based on this we can set up a method for proving undecidability of problems Dr 
for different properties R. The method can easily be obtained from the previous one 
(just substitute R for P and c.e. sets for p.c. functions). In this way we can prove the 
undecidability of many decision problems, such as: “Js a in ¥? Is X equal to A? 
Is X regular? Is X computable?”, where a and A are parameters. 


Remarks. (1) Since c.e. sets are semi-decidable and vice versa, Rice’s Theorem for c.e. sets can be 
stated in terms of semi-decidable sets: Any nontrivial property of semi-decidable sets is undecidable. 
(2) What about the properties of computable (i.e. decidable) sets? Is there an analogue of Rice’s 
Theorem for these? This does not automatically follow from Rice’s Theorem for c.e. sets, as the 
class of computable sets is properly contained in the class of c.e. sets. Still, the answer is yes. 
Informally, the analogue states: Jf a property R of computable sets is decidable then finitely many 
integers determine whether a computable set V has the property R. 


9.4.4 Consequences: Behavior of Abstract Computing Machines 


Let us return to Rice’s Theorem for index sets. Since indexes represent Turing pro- 
grams (and Turing machines), this version of Rice’s Theorem refers to TPs (and 
TMs). In essence, it tells us that only trivial properties of TPs (and TMs) are decid- 
able. For example, whether or not an arbitrary TM halts is an undecidable question, 
because the property of halting is not trivial. 

This version of Rice’s Theorem also discloses an unexpected relation between 
the local and global behavior of TMs and, consequently, of abstract computing ma- 
chines in general. Recall from p. 158 that we identified the global behavior of a TM 
T with its proper function (i.e., the function @r the machine T computes) or with its 
proper set (i.e., the set L(T) the machine accepts and hence recognizes). In contrast, 
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the local behavior of T refers to the particular way in which T’s program 6 executes. 
Rice’s Theorem tells us that, given an arbitrary T, we know perfectly how T behaves 
locally, but we are unable to algorithmically determine its global behavior—unless 
this behavior is trivial (in the above sense). Since the models of computation are 
equivalent, we conclude: In general, we cannot algorithmically predict the global 
behavior of an abstract computing machine from the machine’s local behavior. 


9.5 Incomputability of Other Kinds of Problems 


In the previous sections we were mainly interested in decision problems. Now that 
we have developed a good deal of the theory and methods for dealing with decision 
problems, it is time to take a look at other kinds of computational problems. As 
mentioned in Sect. 8.1.1, there are also search problems, counting problems, and 
generation problems. For each of these kinds there exist incomputable problems. 

In this section we will prove the incomputability of a certain search problem. 
Search problems are of particular interest because they frequently arise in prac- 
tice. For example, the sorting problem is a search problem: To sort a sequence 
a\,42,...,a, of numbers is the same as to find a permutation /),i2,...,i, of indexes 
1,2,..., so that a;, < aj, < ... < a;,. Actually, every optimization problem is a 
search problem because it involves the search for a feasible solution that best fulfills 
a given set of conditions. 

We now focus on the following search problem. 


A SHORTEST EQUIVALENT PROGRAM 
Let code(A) be an arbitrary program describing an algorithm A. The aim is: 


“Given a program code(A), find the shortest equivalent program.” 


So the problem asks for the shortest description of an arbitrary given algorithm A. 


Before we continue we must clear up several things. First, we consider two pro- 
grams to be equivalent if, for every input, they return the same results (although 
they compute the results in different ways). Secondly, in order to speak about the 
shortest programs, we must define the Jength of a program. Since there are several 
ways to do this, we ask, which of them make sense? Here, the following must hold 
if a definition of the program length is to be considered reasonable: 


If programs get longer, then, eventually, 
their shortest equivalent programs get longer too. 


To see this, observe that if the lengths of the shortest equivalent programs were 
bounded above by a constant, then there would be only a finite number of shortest 
programs. This would imply that all of the programs could be shortened into only 
a finite number of shortest equivalent programs. But this would mean that only a 
finite number of computational problems can be algorithmically solved. 
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Now we are prepared to state the following proposition. 


Proposition 9.1. The problem SHORTEST EQUIVALENT PROGRAM is incomputable. 


Box 9.3 (Proof of Proposition 9.1). 


(Contradiction with the Recursion Theorem) We focus on Turing programs and leave the general- 
ization to arbitrary programming languages to the reader. For the sake of contradiction we make 
the following supposition. 


Supposition (*): There exists a Turing program S that transforms every Turing program into the 
shortest equivalent Turing program. 


Three questions immediately arise: (1) What is the transformation of a Turing program? 
(2) What is the length of a Turing program? (3) When are two Turing programs equivalent? 
The answers are: 


1. The transformation of a Turing program: Every Turing program can be represented by a natural 
number, the index of this Turing program (see Sect. 6.2.1). Considering this, we can view the 
supposed Turing program S as computing a function s : N — N that transforms (i.e., maps) an 
arbitrary index into another index. The above supposition then states, among other things, that 
s is a (total) computable function. 


2. The length of a Turing program: Indexes of Turing machines have two important properties. 
First, every Turing program is encoded by an index. Secondly, indexes increase (decrease) 
simultaneously with the increasing (decreasing) of (a) the number of instructions in Turing 
programs; (b) the number of different symbols used in Turing programs; and (c) the number of 
states needed in Turing programs. Therefore, to measure the /ength of a Turing program by its 
index seems to be a natural choice. 


3. The equivalence of Turing programs: Recall (Sect. 6.3.1) that a Turing program P with index n 
computes the values of the computable function @, (i.e., the proper function of the Turing 
machine 7,,). We say that a Turing program P’ is equivalent to P if P’ computes, for every 
input, exactly the same values as P. Remember also (Sect. 7.2) that the indexes of all the Turing 
programs equivalent to P constitute the index set ind(@,). Now it is obvious that s(m) must be 
the smallest element in the set ind(@,). 


Since s(n) is an index (i.e., the code of a Turing program), we see that, by the same argument 
as above, the function s cannot be bounded above by any constant. In other words, if the lengths 
of Turing programs increase, so eventually do the lengths of their shortest equivalent programs; 
that is, 

For every n there exists an n’ > n such that s(n) < s(n’). (1) 


Using the function s we can restate our supposition (*) more precisely: 
Supposition (**): There is a (total) computable function s : N — N such that 


s(n) © the smallest element in ind(@,). (2) 
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We now derive a contradiction from the supposition (**). First, let f : N > N be the function 
whose values f(x) are computed by the following Turing machine S. Informally, given an arbitrary 
input x € N, the machine S computes and generates the values s(k), k =0,1,... until an s(m) for 
which s(m) > x has been generated. At that point S returns the result f(x) := s(m) and halts. Here 
is a more detailed description. The machine S has one input tape, one output tape, and two work 
tapes, W and W) (see Fig. 9.8). The Turing program of S operates as follows: 


1. Let x € N be an arbitrary number written on the input tape of S. 
2. S generates the numbers k = 0,1,2,... on W; and after each generated k executes steps 3-6: 


3. | S computes the value s(k); 
4. if s(k) has not yet been written to W, 
5. then S generates s(k) on Wa; 
6. if s(k) > x, then S copies s(k) to the output tape (so f(x) = s(k)) and halts. 
J 
gees —_ 
an an input tape 
Jy ; See 
012...k : W, 
s(0) s(1) s(2) .. . s(&) WW; 
Fig. 9.8 The configuration of F(x) a output tape 


the Turing machine S$ 


In step 3, the machine S computes the function s. Because s is supposed to be a computable func- 
tion (2), and (1) holds, the machine S halts for arbitrary x. Consequently, f is a computable function. 


We now show that such an f cannot exist. Let x be an arbitrary natural number. We saw that there 
exists an m € N such that s(m) > x. But s(m) is by definition the smallest index in the set ind(@»), 
so the relation s(m) > x tells us that 
x Z ind(@n). 
At the same time, f(x) = s(m), so 
f(x) € ind(@n). 

Since x and f(x) are not both in the same index set, the functions @, and @,(,) cannot be equal; 
that is, 

Gr F Prix), for every x EN. (3) 
The inequality (3) reveals that f has no fixed point. But we have seen that f is a computable 
function, so—according to the Fixed-Point Theorem—f should have a fixed point. This is a contra- 
diction. Consequently, f is not a computable function. This means that the machine S does not halt 
for some x. Since (1) holds, it must be that (2) does not hold. Our supposition must be rejected. 
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9.6 Chapter Summary 


General methods of proving the undecidability of decision problems use diagonal- 
ization, reduction, the Recursion Theorem, or Rice’s Theorem. 

Diagonalization can be used to prove that a set S of all elements with a given 
property P is uncountable. It uses a table whose vertical axis is labeled with the 
members of S and the horizontal axis by all natural numbers. The supposition is that 
each member of S can be uniquely represented by a sequence of symbols, written 
in the corresponding row, over an alphabet. If we change each diagonal symbol, and 
the obtained sequence of diagonal symbols represents an element of S, then the set is 
uncountable. Another use of diagonalization is to interpret S as the set of all indexes 
of all algorithms having a given property P. We want to know whether the property 
is decidable. The labels on the horizontal axis of the table are interpreted as inputs to 
algorithms. Each entry in the table contains the result of applying the corresponding 
algorithm on the corresponding input. Of special interest is the diagonal of the table. 
The idea is to design a shrewd algorithm that will use the alleged decider of the set 
S, and uncover the inability of the decider to decide whether or not the shrewd 
algorithm has the property P. Then the property P is undecidable. 

Reduction is a method where a given problem is expressed in terms of another 
problem in such a way that a solution to the second problem would give rise to a 
solution of the first problem. For decision problems, we can use the m-reduction or 
1-reduction. The existence of the m-reduction can be used to prove the undecidabi- 
lity of a decision problem. More powerful is the Turing reduction (see Part III). 

The Recursion Theorem can be used to prove the undecidability of a decision 
problem. The method relies on the fact that every computable function has a fixed 
point. If we prove that a given function has no fixed point, then the function cannot 
be computable. Since the characteristic function of a set is a function, this method 
can be used to prove the undecidability of a set. 

Rice’s Theorem is used to prove that certain properties of partial computable 
(p.c.) functions or computably enumerable (c.e.) sets are undecidable. The prop- 
erties must be intrinsic to functions or sets, and thus independent of the way the 
functions are computed or the sets recognized. Rice’s Theorem states that every 
nontrivial intrinsic property is undecidable. 
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Problems 


9.1. Prove: 
(a) <m is a reflexive and transitive relation; 


(b) A <m B if r(A) C B and r(A) CB, for some computable function r. 


Definition 9.2. (m-Complete Set) A set A is said to be m-complete if A is c.e. and B <,, A 
for every c.e. set B. A set A is 1-complete if A is c.e. and B <, A for every c.e. set B. 


9.2. Prove: 
(a) K is m-complete; 


(b) Ko is m-complete. 


9.3. Prove: 
(a) K is 1-complete; 


(b) Ko is 1-complete. 


9.4. Prove: A is 1-complete => A is m-complete. 


9.5. Construct the following Turing machines (which we used in Example 9.5 on p. 215): 
(a) a TMA that takes (7, w) as input and outputs a TM B such that 


L(B) = Ko i T accepts w; 
® — if T does not accept w. 


[Hint. The actions of B on input x are as follows: B starts T on w; if T accepts w, then B 
starts U (the recognizer of Ko) on x and outputs U’s answer (if it is a YES) as its own; 
if, however, T doesn’t accept w, then U is not started, so B outputs nothing. It remains to 
be shown that there exists a Turing machine A that is capable of computing the code (B).] 


(b) aTM R that recognizes Ko if allowed to call the above A and the alleged decider DiyQ) 
for the problem Q = “Is L(T) computable?” 


[Hint. The actions of R on input (7, w) are: R calls A on the input (7, w); hands over the 
result (B) of A to Djg); and outputs D;g)’s answer (if it is a YES) as its own answer.] 


9.6. Prove that the following decision problems are undecidable (use any method of this chapter): 
(a) “Does T halt on empty input?” 
(b) “Does T halt on every input?” 
(c) “Does T halt on wo?” (wo is an arbitrary fixed word) 
(d) “Does M halt on w?” (M is a certain fixed TM) 


(e) “Do T and 7’ halt on the same inputs?” 
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(f) “Is the language L(T) empty?” (= Demp) 
[Hint. Reduce Dyai to Demp-] 
(g) “Does an algorithm A terminate on every input data?” (= Dzerm) 
(h) “Does an algorithm A terminate on input data d?’ 
(i) “Is dom(@) empty?” (= Dx, ) 
(j) “Is dom(@) finite?” (= DFin) 
(k) “Is dom(@) infinite?” (= Dz,¢) 
(1) “Is A—dom(@) finite?” (= Deo) 
(m) “Is @ total?” (= Dor) 
(n) “Can @ be extended to a (total) computable function?” (= Dey) 
(0) “Is @ surjective?” (= Dgsy;) 
(p) “Is @ defined at x?” 
(q) “Is p(x) =y for at least one x?” 
(tr) “Is dom(@) = dom(y)?” 
(s) “Isg@~z~ yw?” 
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Part III 
RELATIVE COMPUTABILITY 


In the previous part we described how the world of incomputable problems was 
discovered. This resulted in an awareness that there exist computational problems 
that are unsolvable by any reasonable means of computing, e.g., the Turing machine. 
In Part II, we will focus on decision problems only. We will raise the questions, 
“What if an unsolvable decision problem had been somehow made solvable? Would 
this have turned all the other unsolvable decision problems into solvable problems?” 
We suspect that this might be possible if all the unsolvable decision problems were 
somehow reducible one to another. However, it will turn out that this is not so; some 
of them would indeed become solvable, but there would still remain others that are 
unsolvable. We might speculate even further and suppose that one of the remaining 
unsolvable decision problems was somehow made solvable. As before, this would 
turn many unsolvable decision problems into solvable ones; yet, again, there would 
remain unsolvable decision problems. We could continue in this way, but we would 
never exhaust the class of unsolvable problems. 

Questions of the kind “Had the problem Q been solvable, would this have made 
the problem P solvable too?” are characteristic of relativized computability, a large 
part of Computability Theory. This theory analyzes the solvability of problems 
relative to (or in view of) the solvability of other problems. Although such ques- 
tions seem to be overly speculative and the answers to be of questionable practical 
value, they nevertheless reveal a surprising fact: Unsolvable decision problems can 
differ in the degree of their unsolvability. We will show that the class of all decision 
problems partitions into infinitely many subclasses, called degrees of unsolvability, 
each of which consists of all equally difficult decision problems. It will turn out that, 
after defining an appropriate relation on the class of all degrees, the degrees of un- 
solvability are intricately connected in a lattice-like structure. In addition to having 
many interesting properties per se, the structure will reveal many surprising facts 
about the unsolvability of decision problems. 

We will show that there are several approaches to partitioning the class of deci- 
sion problems into degrees of unsolvability. Each of them will establish a particular 
hierarchy of degrees of unsolvability, such as the jump and arithmetical hierarchies, 
and thus offer yet another view of the solvability of computational problems. 


ye 
Chapter 10 Ritiem 
Computation with External Help 


An oracle was an ancient priest who made statements about 
future events or about the truth. 


Abstract According to the Computability Thesis, all models of computation, in- 
cluding those yet to be discovered, are equivalent to the Turing machine and for- 
malize the intuitive notion of computation. In other words, what cannot be solved 
on a Turing machine cannot be solved in nature. But what if Turing machines could 
get external help from a supernatural assistant? In this chapter we will describe the 
birth of this idea, its development, and the formalization of it in the concept of the 
oracle Turing machine. We will then briefly describe how external help can be added 
to other models of computation, in particular to U-recursive functions. We will con- 
clude with the Relative Computability Thesis, which asserts that all such models are 
equivalent, one to the other, thus formalizing the intuitive notion of “computation 
with external help.” Based on this, we will adopt the oracle Turing machine as the 
model of computation with external help. 


10.1 Turing Machines with Oracles 


“What if an unsolvable decision problem were somehow made solvable?” For in- 
stance, what if we somehow managed to construct a procedure that is capable of 
mechanically solving an arbitrary instance of the Halting Problem? But in making 
such a supposition, we must be cautious: Since we have already proved that no algo- 
rithm can solve the Halting Problem, such a supposition would be in contradiction 
with this fact if we meant by the hypothetical procedure an algorithm as formalized 
by the Computability Thesis. Making such a supposition would render our theory in- 
consistent and entail all the consequences described in Box 4.1 (p. 56). So, to avoid 
such an inconsistency, we must assume that the hypothetical procedure would not 
be the ordinary Turing machine. 

Well, if the hypothetical procedure is not the ordinary Turing machine, then what 
is it? The answer was suggested by Turing himself. 
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10.1.1 Turing’s Idea of Oracular Help 


In 1939, Turing published yet another seminal idea, this time of a machine he called 
the o-machine. Here is the citation from his doctoral thesis: 


Let us suppose that we are supplied with some unspecified means of solving number- 
theoretic problems; a kind of oracle as it were. We shall not go any further into the nature 
of this oracle apart from saying that it cannot be a machine. With the help of the oracle 
we could form a new kind of machine (call them o-machines), having as one of its funda- 
mental processes that of solving a given number-theoretic problem. More definitely these 
machines are to behave in this way. The moves of the machine are determined as usual by 
a table except in the case of moves from a certain internal configuration 0. If the machine 
is in the internal configuration 0 and if the sequence of symbols marked with / is then the 
well-formed formula A, then the machine goes into the internal configuration p or t accord- 
ing as it is or is not true that A is dual. The decision as to which is the case is referred to the 
oracle. 

These machines may be described by tables of the same kind as those used for the de- 
scription of a-machines, there being no entries, however, for the internal configuration 0. 
We obtain description numbers from these tables in the same way as before. If we make the 
convention that, in assigning numbers to internal configurations, 0, p, t are always to be 
q2,93,q4, then the description numbers determine the behavior of the machines uniquely. 


Remarks. Let us add some remarks about the terminology and notation used in the above passage. 
The “internal configuration” is what we now call the state of the Turing machine. The “sequence 
of symbols marked with /” is a sequence of symbols on the tape that are currently marked in a 
certain way. We will see shortly that this marking can be implicit. The “duality” is a property that a 
formula has or does not have; we will not need this notion in the subsequent text. The “table” is the 
table A representing the Turing program 6 as decribed in Fig. 6.3 (see p. 113). To comply with our 
notation, we will from now on substitute Gothic symbols as follows: 0++ q7, p+ q4 and t+ q_. 


Now we see that the set Q of the states of the o-machine would contain, in addition 
to the usual states, three distinguished states go,q4,q—. The operation of the o- 
machine would be directed by an ordinary Turing program, i.e., a transition function 
6:QxI >QxT x {Left, Right, Stay}. The function would be undefined in the 
state qo, for every z €T, i.e., Vz € '5(q2,z) +t. However, upon entering the state qo, 
the o-machine would not halt; instead, an oracle would miraculously supply, in the 
very next step of the computation, a cue telling the o-machine which of the states q+ 
and q_ to enter. In this way the oracle would answer the o-machine’s question asking 
whether or not the currently marked word on the tape is in a certain, predefined set of 
words, a set which can be decided by the oracle. The o-machine would immediately 
take up the cue, enter the suggested state, and continue executing its program 6. 


Further Development of Turing’s Idea 


Turing did not further develop his o-machine. The idea remained dormant until 
1944, when Post awakened it and brought it into use. Let us describe how the idea 
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evolved. We will focus on the interplay between the o-machine and its oracle. In do- 
ing so, we will lean heavily on Definition 6.1 of the ordinary TM (see Sect. 6.1.1). 

A Turing machine with an oracle, i.e., the o-machine, consists of the following 
components: (1) a control unit with a program; (2) a potentially infinite tape divided 
into cells, each of which contains a symbol from a tape alphabet = {z),z2,...,z;}, 
where z; = 0,z2 = 1, and z = LJ; (3) a window that can move to the neighboring 
(and, eventually, to any) cell, thus making the cell accessible to the control unit for 
reading or writing; and (4) an oracle. (See Fig. 10.1.) 


Fig. 10.1 The o-machine with 
the oracle for the set O can 
ask the oracle whether or not 
the word w is in O. The oracle 
answers in the next step and 
the o-machine continues its 
execution appropriately 


The control unit is always in some state from a set Q = {q1,...,4s,97,9+,9q_}- 
As usual, the state q; is initial and some of q1,...,q@s are final states. But now there 
are also three additional, special states, denoted by qo, q+, and q_. 

The program is characteristic of the particular o-machine; as usual, it is a partial 
function 6:QxI + QxTI ~x {Left, Right, Stay}. But, in addition to the usual in- 
structions of the form 6(qi,z-) = (qj,Zw,D), for 1 <i, j <s and 1<rw <t, there 
are four special instructions for each z, € I: 


5(q9,2r) = (44,2r, Stay) (*) 
5(q2,2r) = (q_,2r, Stay) 


5 (94, 2r) = (4,52 ,P1) («) 
6(q_,2r) 7 (GjesSwyeD 2) 


where qj,,4j, © O—{4¢+,9-}, Zw, .Zw, € I, and D1, D2 € {Left, Right, Stay}. 


Before the o-machine is started the following preparative arrangement is made: (1) 
an input word belonging to the set L* such that {0,1} C 2 CI — {4} is written on 
the tape; (2) the window is shifted over the leftmost symbol of the input word; (3) 
the control unit is set to the initial state g; and (4) an arbitrary set O C 2* is fixed. 
We call O the oracle set. 
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From now on the o-machine operates in a mechanical stepwise fashion as di- 
rected either by the program 6 or by the oracle. The details are as follows. Suppose 
that the control unit is in a state g € Q and the cell under the window contains a 
symbol z, € I”. (See Fig. 10.1.) Then: 


e Ifq#q», then the o-machine reads z, into its control unit. If 6(q¢,z,)1= (q',2w,D), 
the control unit executes this instruction as usual (1.e., writes z,, through the win- 
dow, moves the window in direction D, and enters the state gq’); otherwise, the 
o-machine halts. In short, when the o-machine is not in state q», it operates as an 
ordinary Turing machine. 


e Ifg=4q», then the oracle takes this as the question 
w €70, (x x) 


where w is the word starting under the window and ending in the rightmost 
nonempty cell of the tape. The oracle answers the question (* * *) in the next 
step of the computation by advising the control unit which of the special states 
q+, q— to enter (without changing z, under the window or moving the window). 
We denote the oracle’s advice by “+” when w € O, and by “—” when w ¢ O. The 
control unit always takes the advice and enters the state either g or g_. In short, 
the oracle helps the program 6 to choose the “right” one of the two alternative 
instructions (*). In this way the oracle miraculously resolves the nondeterminism 
arising from the state gq». 


After that, the o-machine continues executing its program. In particular, if the pre- 
vious state has been qo, the program will execute the instruction either 6(q+,z;) = 
(Gj, Zw, D1) or 6(q_,Zr) = (Gj,Zw,D2), depending on the oracle’s advice. Since 
the execution continues in one or the other way, this allows the program 6 to react 
differently to different advice. In short, the oracle’s advice will be taken into account 
by the Turing program 6. 

The o-machine can ask its oracle, in succession, many questions. Since between 
two consecutive questions the o-machine can move the window and change the 
tape’s contents, the questions w €?Q, in general, differ in words w. But note that 
they all refer to the set O, which was fixed before the computation started. 

The o-machine halts if the control unit has entered a final state, or the program 
specifies no next instruction (the machine reads z, in a state g(# gz) and 6(q,z,)‘). 


Remark. Can the o-machine ask the oracle a question u €?O, where u is not the word currently 
starting under the window and ending at the rightmost nonempty symbol of the tape? The answer 
is yes. The machine must do the following: It remembers the position of the window, moves the 
window past the rightmost nonempty symbol, writes u to the tape, moves the window back to the 
beginning of u, and enters the state g?. Upon receiving the answer to the question u €?0, the ma- 
chine deletes the word u, returns the window to the remembered position, enters the state g+ or q— 
depending on the oracle’s answer, and continues the “interrupted” computation. 


10.1 Turing Machines with Oracles 237 


10.1.2 The Oracle Turing Machine (o-TM) 


To obtain a modern definition of the Turing machine with an oracle, we make the 
following two adaptations to the above o-machine: 


1. We model the oracle with an oracle tape. The oracle tape is a one-way un- 
bounded, read-only tape that contains all the values of the characteristic function 
Xo. (See Fig. 10.2.) We assume that it takes, for arbitrary w € X*, only one step 
to search the tape and return the value yo(w). 


2. We eliminate the special states q7,q+4,q— and redefine the instructions’ seman- 
tics. Since the special states behave differently, we will get rid of them. How 
can we do that? The idea is to 1) adapt the o-machine so that it will interrogate 
the oracle at each instruction of the program, and 2) leave it to the machine to 
decide whether or not the oracle’s advice will be taken into account. To achieve 
goal 2 we must only use what we learned in the previous subsection: The or- 
acle’s advice on the question 5(q,z,) will have no impact on the computation 
iff 5(q+,zr) = 5(q_,z,). To achieve goal 1, however, we must redefine what 
actions of the machine will be triggered by the instructions.! In particular, we 
must redefine the instructions in such a way that each instruction will make the 
machine consult the oracle. Clearly, this will call for a change in the definition of 
the transition function. 


Remark. We do not concern ourselves with how the values ¥o(w) emerged on the oracle tape and 
how it is that any of them can be found and read in just one step. A realistic implementation of 
such an access to the contents of the oracle tape would be a tremendous challenge when the set O 
is infinite. Actually, if we knew how to do this for any set O then according to the Computability 
Thesis there would exist an ordinary Turing machine capable of providing, in finite time, any data 
from the oracle tape. Hence, oracle Turing machines could be simulated by ordinary TMs. This 
would dissolve the supernatural flavor of the oracle Turing machine and bring back the threat of 


inconsistency (as we explained on p. 233.) 


“"" forO 
question: I 
werO ,; €= Xo(w) 


Fig. 10.2 The oracle Turing 
machine has an oracle tape. 
Given an oracle set O C 2*, 
the oracle tape contains all 
the values of the charac- 
teristic function 7. Here, Sn ane 
€ = Xo(w), for any w € L* W 


' In other words, we must redefine the operational semantics of the instructions. 
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After making these adaptations we obtain a modern definition of the Turing machine 
that uses external help. We will call it the oracle Turing machine, or for short, o-TM. 


Definition 10.1. (Oracle Turing Machine) The oracle Turing machine with 
oracle set © (for short, o-TM with oracle set O, or O-TM) consists of a 
control unit, an input tape, an oracle Turing program, an oracle tape, and a set 


O. Formally, O-TM is an eight-tuple T° = (0,2,I',6,q1,.4,F,©). 


The control unit is in a state from the set of states OQ = {q1,...,qs}, s>1. We 
call q; the initial state; some states are final and belong to the set F C Q. 

The input tape is a usual one-way unbounded tape with a window. The 
tape alphabet is T = {z),...,z;}, where t > 3. For convenience we fix z; = 0, 
z2 = 1, and z; =u. The input alphabet is a set Y, where {0,1} CY CI—{.4}. 

The oracle set O is an arbitrary subset of 2*. 

The oracle tape contains, for each w € X*, the value ¥o(w) of the char- 
acteristic function Yo : O > {0,1}. It is a read-only tape; although lacking a 
window, it can immediately find and return the value ¥o(w), for any w € 2*. 

The oracle Turing program (for short, o-TP) resides in the control unit; it 
is a partial function 


5:Q0xIx {0,1} + OxT x {Left, Right, Stay}. 


Thus, any instruction is of the form 


Cig (qeeeP), 


which is interpreted as follows: If the control unit is in the state g, and 
reads z and e from the input and oracle tape, respectively, then it changes 
to the state q’, writes z’ on the input tape, and moves the window in the 
direction D. Here, e denotes the value ¥o(w), where w is the word starting 
under the window and ending in the rightmost nonempty cell of the input tape. 


Before o-TM is started, the following take place: 1) an input word belonging 
to &* is written to the beginning of the input tape; 2) the window is shifted 
to the beginning of the tape; 3) the control unit is set to the initial state q1; 4) 
an oracle set O C 2” is fixed. From now on O-TM operates in a mechanical 
stepwise fashion, as instructed by 6. The machine halts when it either enters 
a final state, or reads z and e ina state g such that 6(q,z,e) t. 


NB The oracle is not omnipotent; it is just an unsurpassable expert at recognizing 
the current set O C &*. Since we adopted the Computability Thesis, we must admit 
that this capability is supernatural if O is an undecidable set (in the ordinary sense). 
In such a case we will not ask where the oracle’s expertise comes from. 
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10.1.3 Some Basic Properties of o-TMs 


Let us list a few basic properties of o-TMs that follow directly from Definition 10.1. 
First, looking at Definition 10.1 we find that if we change the oracle set, say from 
O to O! C E*, no change is needed in the o-TP 6, i.e., no instruction 5(q, ze)= 
(q',z’,D) must be changed, added, or deleted. The program 5 remains capable of 
running with the new oracle set 0’. In other words, the o-TP 5 is insensitive to the 
oracle set O. (By the way, this is why the symbol 5 has no symbol O attached.) 


Consequence 10.1. Oracle Turing programs 5 are not affected by changes in O. 


Second, although changing O does not affect the o-TP 6, it does affect, in gen- 
eral, the execution of 5. Namely, if O changes, 7c also changes, so the values e 
read from the oracle tape change too. But these values are used to select the next 
instruction to be executed. Nevertheless, we can construct o-TPs whose executions 
are insensitive to changes in the oracle set. Such an o-TP must just strictly ignore 
the values e read from the oracle tape. To achieve this, the o-TP must be such that 


o-TP has instr. 5(q,z,0) = (q',7,D) <= o-TP has instr. 5(q,z, 1) =(q'i2D); 


for every pair (q,z) € @xT-. So, if o-TM reads z in state q, it performs (q’,z’,D) 
regardless of the value e. If the above only holds for certain pairs (q,z), the o-TP 6 
ignores the oracle for those pairs only. We sum this up in the following statement. 


Consequence 10.2. During the execution, oracle Turing programs 6 can ignore O. 


Third, we ask: What is the relation between the computations performed by 
o-TMs and the computations performed by ordinary TMs? Let 7 be an arbitrary 
ordinary TM and 6 its TP. We can easily construct an o-TM that simulates TM T. 
Informally, the o-TP 5 does, while ignoring its oracle set, what the TP 6 would do. 
To achieve this, we must construct the program 5 in the following way: 


let 5 have no instructions; 
for each instruction 6(q,z) = (q’,z’,D) in 6 do 
add instructions 5(q,z,0) = (q’,z’,D) and 6(q,z, 1) = (q’,z,D) to 6. 


So we can state the following conclusion. 
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Theorem 10.1. Oracle computation is a generalization of ordinary computation. 


Fourth, we now see that each O-TM, say T° = (Q,2,T, 5,41 ut, F,O), is char- 
acterized by two independent components: 


1. a particular o-TM, denoted by T* = (Q,2,T, 5, g1,La,F, «), which is capable of 
consulting any particular oracle set when it is selected and substituted for » ; 
2. a particular oracle set O which is to be “plugged into” T* in place of «. 


Formally, 
T°? =(T*,O). 


10.1.4 Coding and Enumeration of o-TMs 


Like ordinary Turing programs, oracle Turing programs can also be encoded and 
enumerated. The coding proceeds in a similar fashion to that in the ordinary case 
(see Sect. 6.2.1). But there are differences too: Firstly, we must take into account 


that oracle Turing programs 6 are defined differently from ordinary ones 6, and, 
secondly, oracle sets must be taken into account somehow. Let us see the details. 


Coding of o-TMs 


To encode an O-TM T° = (T*,O) we will only encode the component 7*. The idea 


is that we only encode o-TP 6 of 7*, but in such a way that the other components 
Q,,I’,F, which determine the particular T*, can be restored from the code. An 
appropriate coding alphabet is {0,1}. This is because {0,1} is contained in the 
input alphabet of every o-TM, so every 0-TM will be able to read codes of other 
o-TMs as input data. Here are the details. 


Let T° = (Q,,0,8,q1,.4,F,©) be an arbitrary O-TM. If 
5(4i.zj.€) = (44.Z0-Dmn) 
is an instruction of the program 5, we encode the instruction by the word 
K =0'10/10°10"10'10", (x) 


where D; =Left, D2 =Right, and D3 =Stay. 
In this way, we encode each instruction of the program 6. From the obtained 
codes K,,K2,...,K; we construct the code of the o-TP 6 as follows: 


(5) = 111K, 11K p11...11K, 111. («x) 
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The restoration of Q,2,I",F would proceed in a similar way to the ordinary case 
(see Sect. 7.2). We can therefore identify the code (T*) with the code (6): 


Enumeration of 0-TMs 


Since (T*) is a word in {0,1}*, we can interpret it as the binary representation of a 


natural number. We call this number the index of the o-TM 7% (and of its o-TP 6). 

Clearly, some natural numbers have binary representations that are not of the 
form (**). Such numbers are not indexes of any o-TM T*. This is a weakness of the 
above coding, because it prevents us from establishing a surjective mapping from the 
set of all o-TMs onto the set N of natural numbers. However, as in the ordinary case, 
we can easily patch this by introducing a special o-TM, called the empty o-TM, and 
making the following convention: Any natural number whose binary representation 
is not of the form («*) is an index of the empty 0-TM. (The o-TP 6 of the empty 
o-TM is everywhere undefined, so it immediately halts on any input word.) 

We are now able to state the following proposition. 


Proposition 10.1. Every natural number is the index of exactly one o-TM. 


Given an arbitrary index (7*) we can now restore from it the components 
Q,2,I',F of the corresponding o-TM T* = (Q,2,I',6,q1,-4,F,«). In other words, 
we have implicitly defined a total function g : N — 7* from the set N of natural 
numbers to the set T* of all o-TMs. Given an arbitrary n € N, we view g(n) as the 
nth o-TM and therefore denote this o-TM by 7. By letting n = 0,1,2,... we can 
generate the sequence 

TT ,Ty,..- 


of all o-TMs. Thus, we have enumerated oracle Turing machines. Clearly, the func- 
tion g enumerates oracle Turing programs too, so we also obtain the sequence 


50,51, 62,... 


Remarks. (1) Given an arbitrary index n, we can reconstruct from n both the ordinary Turing 
machine 7, and the oracle Turing machine 7,*. Of course, the two machines are different objects. 
Also different are the corresponding sequences Tj , T>,73,... and T,*, 7; ,T;',.... The same holds for 
the sequences dp, 6), 62,... and dv, 5 : d, .... (2) Plugging different oracle sets into an o-TM T;* 
affects neither the o-TM nor its index. For this reason we will relax our pedantry and from now on 
also say that a natural number n is the index of the O-TM T,°. Conversely, given an O-TM T°, 
we will denote its index by (T°), while keeping in mind that the index is independent of O. 
Furthermore, we will say that the sequence TOsTO, DO is the enumeration of O-TMs, and 
remember that the ordering of the elements of the sequence is independent of the particular O. 
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10.2 Computation with Oracles 


In this section we will continue setting the stage for the theory of oracular computa- 
tion. Specifically, we will define the basic notions about the computability of func- 
tions and the decidability of sets by oracle Turing machines. Since we will build 
on Definition 10.1, we will define the new notions using the universe Y*. Then we 
will switch, w.l.o.g., to the equivalent universe N. This will enable us to develop and 
present the theory in a simpler, unified, and more standard way. 


10.2.1 Generalization of Classical Definitions 


The new definitions will be straightforward generalizations of definitions of the 
proper function, computable function, decidable set, and index set for the ordinary 
TM. We will therefore move at a somewhat faster pace. 


Proper Functionals 


First, we generalize Definition 6.4 (p. 135) of the proper function. Let 7," be an 


arbitrary o-TM and k a natural number. Then we can define a mapping pi) with 


k +1 arguments in the following way: 


Given an arbitrary set O C L* and arbitrary words w,...,u, € &*, write the 


words to the input tape of the O-TM To and start it. Jf the machine halts leaving 


on its tape a single word v € &*, then let elit) (O,u1,...,Ux) “ y; otherwise, let 


k+1 def 
@ (O,uy,...,Ug) =f. 
In the above definition, O is variable and can be instantiated to any subset of L*. 
Identifying subsets of ©* with their characteristic functions we see that 7o, the 
characteristic function of the set O, can be any member of {0,1}*", the set of all 


the functions from L* to {0,1}. Consequently, of) can be viewed as a partial 


function from the set {0,1}*" x (£*)* into the set Y*. Informally, gf) maps 
a function and k words into a word. Now, functions that map other functions we 


usually call functionals. So, pil) is a functional. Since it is associated with a parti- 


cular 0-TM T*, we will call &{*”) the k-+ L-ary proper functional of the o-TM T°. 


n? 


If we fix O to a particular subset of £*, the argument O in BNO, x, er egXE) 


turns into a parameter, so the functional oY Oxy, ..+;X) becomes dependent 
on k arguments x),...,x% only, and hence becomes a function from (Z*)F to L*. 


We will denote this function(al) by po) (x1,.--,Xx), for short bo? lk) and call it 
the proper functional of the O-TM T°. 

In short, each o-TM is associated with a proper functional ® , and each 
O-TM is associated with a proper functional @°O.)”, for arbitrary natural k. 


(k+1) 
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Remarks. (1) It is a common convention to use capital Greek letters in order to distinguish proper 
functionals of o-TMs from proper functions @ of ordinary TMs. The distinction is needed because 
the @s are computed by 5s, while the @s are computed by ordinary 6s. (2) When the number 
k is understood, we will omit the corresponding superscripts in oft) and Bo rh), for example, 
we will simply say “®,, is a proper functional of 7,*;” or “© is a proper functional of T°.” 


Corresponding to the enumeration of o-TMs 7;*, 7;',7;',... is, for each natural k, 


the enumeration of proper functionals pith) pet) ee), .... Similarly, corre- 
sponding to the enumeration of O-TMs ioe ian To: ... is, for each natural k, the 
enumeration of proper functionals oo i(k) o° ®) @O i(k) vk 


Function Computation 


Next, we generalize Definition 6.5 (p. 136) of the computable function. 


Definition 10.2. Let O C E*, k > 1, and g: (Z*)* > E* be a function. We say:? 


@ is O-computable if there is an O-TM that can compute @ 
anywhere on dom(@) 
and dom(@) = (2*)*; 
@ is partial O-computable (or O-p.c.) if there is an O-TM that can compute @ 
anywhere on dom(@); 
@ is O-incomputable if there isno O-TM that can compute @ 
anywhere on dom(@). 


So, a function @ : (£*)* > E* is O-p.c. if it is the k-ary proper functional of some 
O-TM; that is, if there is a natural n such that @ ~ oO), 


Let S C (£*)* be an arbitrary set. In accordance with Definition 6.6 (p. 136), we 
will say that 


e @is O-computable on S if @ can be computed by an O-TM for any x € S; 

e @ is O-p.c. on S if @ can be computed by an O-TM for any x € S such that 
(x) 1; 

e @ is O-incomputable on S if there is no O-TM capable of computing @ for any 
x € S such that p(x) J. 


? Alternatively, we can say that @ is computable (p.c., incomputable) relative to (or, in) the set O. 
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Set Recognition 


Finally, we can now generalize Definition 6.10 (p. 142) of decidable set. 


Definition 10.3. Let O C ¥* be an oracle set. For an arbitrary set S C ©* we say: 


S is O-decidable (or O-computable) in L* if 7s is O-computable on L*; 
S is O-semi-decidable (or O-c.e.) in 2* if 7%s5 is O-computable on S; 
S is O-undecidable (or O-incomputable) in * if 7s is O-incomputable on X*. 


(Remember that 75 : L* — {0,1} is total.) 


Index Sets 


We have seen that each natural number is the index of exactly one 0-TM (p. 241). 
What about the other way round? Is each o-TM represented by exactly one index? 
The answer is no; an o-TM has countably infinitely many indexes. We prove this in 
the same fashion as we proved the Padding Lemma (see Sect. 7.2): Other indexes 
of a given o-TM are constructed by padding its code with codes of redundant in- 
structions, and permuting the codes of instructions. So, if e is the index of T* and x 
is constructed from e in the described way, then p+!) ~ pl!) This, combined 
with Definition 10.2, gives us the following generalization of the Padding Lemma. 


Lemma 10.1. (Generalized Padding Lemma) An O-p.c. function has countably 
infinitely many indexes. Given one of them, countably infinitely many others 
can be generated. 


Similarly to the ordinary case, the index set of an O-p.c. function @ contains all 
the indexes of all O-TMs that compute @. 


Definition 10.4. (Index Set of O-p.c. Function) The index set of an O-p.c. 


def 


function @ is the set ind°(@) = {x EN | 6° ~ g}. 


Let S be an arbitrary O-c.e. set, and Ys its characteristic function. The index set 
ind?(ys5) we will also denote by ind(S) and call the index set of the O-c.e. set S. 


3 We can also say that S is decidable (semi-decidable, undecidable) relative to the set O. 
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10.2.2 Convention: The Universe N and Single-Argument 
Functions 


So far we have been using £* as the universe. But we have shown in Sect. 6.3.6 that 
there is a bijection from 2* onto N, which allows us to arbitrarily choose whether 
to study the decidability of subsets of X* or of subsets of N—the findings apply to 
the alternative too. It is now the right time to make use of this and switch to the 
universe N. The reasons for this are that, first, N is often used in the study of relative 
computability, and second, the presentation will be simpler. 


NB From now on N will be the universe. 


Some definitions will tacitly (and trivially) adapt to the universe N. For instance: 


e an oracle set O is an arbitrary subset of N (Definition 10.1); 

e afunction g : N‘ + N is O-computable if there is an O-TM that can compute @ 
anywhere on dom(@) = N (Definition 10.2); 

e aset S CN is O-decidable on N if Ys is O-computable on N (Definition 10.3). 


The next two adapted definitions come into play when the impact of different oracle 
sets on oracular computability is studied: 


e aproper functional of the o-TM T,* is ot) ; {0,1}‘ x N* — N (Sect. 10.2.1). 
e a proper functional of the O-TM T° is oo) : N‘ — N (Sect. 10.2.1). 


Next, w.l.o.g., we will simplify our discussion by focusing on the case k = 1. 


NB From now on we will focus on single-argument functions (the case k = 1). 


10.3 Other Ways to Make External Help Available 


External help can be made available to other models of computation too. 

We can introduce external help to U-recursive functions. The idea is simple: 
Given a set O CN, add the characteristic function Yo to the set {€,0, 2*} of initial 
functions (see Sect. 5.2.1). Any function that can be constructed from {€,0,2*, Yo} 
by finitely many applications of composition, primitive recursion, and the U-opera- 
tion, uses external help if Yo appears in the function’s construction. Kleene treated 
(general) recursive functions similarly, where certain auxiliary functions (possibly 
Xo) would be made available to systems of equations &(f). 

Post introduced external help into his canonical systems by hypothetically adding 
primitive assertions expressing (non)membership in O. 

Davis proved the equivalence of Kleene’s and Post’s approaches. The equiva- 
lence of each of the two and Turing’s approach was proved by Kleene and Post, 
respectively. Consequently, Turing’s, Kleene’s, and Post’s definitions of functions 
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computable with external help are equivalent, in the sense that if a function is com- 
putable according to one definition it is also computable according to the others. 


10.4 Relative Computability Thesis 


Based on these equivalences and following the example of the Computability Thesis, 
a thesis was proposed stating that the intuitive notion of the “algorithm with external 
help” is formalized by the concept of the oracle Turing machine. 


Relative Computability Thesis. 


“algorithm with external help” <—+ oracle Turing program (or equivalent model) 


Also, the intuitive notion of being “computable with external help” is formalized by 
the notion of being O-computable. 


10.5 Practical Consequences: o-TM with a Database or Network 


Since oracles are supernatural entities, the following question arises: Can we replace 
a given oracle for a set O with a more realistic concept that will simulate the oracle? 
Clearly, if O is a decidable set, the replacement is trivial—we only have to replace 
the oracle with the ordinary TM that is the decider of the set O (or, alternatively, the 
computer of the function 7). 

A different situation occurs when © is undecidable (and, hence, ¥o incom- 
putable). Such an O cannot be finite, because finite sets are decidable. But in prac- 
tice, we do not really need the whole set O, i.e., the values ¥o(x) for every x E N. 
This is because the o-TM may only issue the question “x €?O” for finitely many 
different numbers x € N. (Otherwise, the machine certainly would not halt.) Conse- 
quently, there is an m € N that is the largest of all xs for which the question “x €?O” 
is issued during the computation. Of course, m depends on the input word that was 
submitted to the o-TM. 

Now we can make use of the ideas discussed in Sect. 8.4. We could compute the 
values ¥o(i), i= 0,1,...,m in advance, where each ¥o(i) would be computed by 
an algorithm A; specially designed to answer only the particular question “i €?O”. 
The computed values would then be stored in an external database and accessed 
during the oracular computation. Alternatively, we might compute the values 7o (i) 
on the fly during the oracular computation: Upon issuing a question “i €?O”, an 
external network of computers would be engaged in the computation of the answer 
to the question. Of course, the o-TM would idle until the answer arrived. 
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database network of computers ogfive 


question: . question: is 
we?rO we?O answer: 


a) b) 


Fig. 10.3 The oracle tape is replaced by a) a database with a finite number of precomputed values 
Xo(i),i=0,1,...,m; b) a network of computers that compute each value ¥o (i) separately upon 
the o-TM’s request 


10.6 Practical Consequences: Online and Offline Computation 


In the real world of computer technology and telecommunications, often a device A 
is associated with some other device B. This association can be dynamic, changing 
between the states of connectedness and disconnectedness. We say that A is online 
when it is connected to B; that is, either B is under the direct control of A, or B 
is available for immediate use on A’s demand. Otherwise, we say that A is offline 
(disconnected). 

Let A be a computer and B any device capable of providing data. We say that A 
executes an online algorithm if the algorithm processes its input data in the order 
in which the data are fed to the algorithm by B. Thus, an online algorithm cannot 
know the entire input that will be fed to it, so it may be forced to correct its past 
actions when new input data arrive. In contrast, we say that A executes an offline 
algorithm if the algorithm is given the whole input data at the beginning of the 
computation. This allows it to first inspect the whole input and then choose the 
appropriate computation strategy. 

We can now see the parallels between online/offline computing and oracu- 
lar/ordinary computing. Ordinary Turing machines are offline; they are disconnected 
from any oracle, or any other device, such as an external database or computer net- 
work. Ordinary Turing programs are offline algorithms, because they are given the 
entire input before they are started. In contrast, oracle Turing machines are online; 
they are connected to an oracle or a device such as an external database or a com- 
puter network. Oracle Turing programs are online algorithms: Although a part of 
their input is written on the input tape at the beginning of the computation, the other 
part—the oracle’s advice—is processed piece by piece in a serial fashion, without 
knowing the advice that will be given on future questions. 
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10.7 Chapter Summary 


An oracle Turing machine with an oracle set consists of a control unit, an input tape, 
an oracle Turing program, an oracle tape, and a set O. The oracle tape contains all 
the values of the characteristic function Yo, each of which can be accessed and read 
in a single step. These values are demanded by instructions of the oracle Turing 


program. Each instruction is of the form 6(q,z,e) = (q’,z’,D), where e is the value 
of ¥o(w) and w is the current word on the input tape starting in the window and 
ending at the last non-space symbol. The instructions differ from the instructions 
of an ordinary Turing machine. While oracle Turing programs are independent of 
oracle sets, this is not true of their executions: Computations depend on the oracle’s 
answers and hence on the oracle sets. Oracular computation is a generalization of 
ordinary computation. Oracle TMs and their programs can be coded and enumer- 
ated. With each oracle TM is associated its proper functional. Given an oracle set 
O, a partial function can be O-computable, O-p.c., or O-incomputable; and a set 
can be O-decidable, O-c.e., or O-undecidable. The Relative Computability Thesis 
states that the oracle Turing machine formalizes the intuitive notion of the “algo- 
rithm with external help.” In practice, the contents of the unrealistic oracle tape 
could be approximated 1) by a finite sequence of precomputed values of ¥o (stored 
in an external database) or 2) by an on-the-fly calculation of each demanded value 
of ¥%o separately (computed by a network of computers). 
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ye 
Chapter 11 Serato 
Degrees of Unsolvability 


Degree indicates the extent to which something happens or the 
amount something is felt. 


Abstract In Part II, we proved that besides computable problems there are also in- 
computable ones. So, given a computational problem, it makes sense to talk about its 
degree of unsolvability. Of course, at this point we only know of two such degrees: 
One is shared by all computable problems, and the other is shared by all incom- 
putable ones. (This will change, however, in the next chapter.) Nevertheless, the 
main aim of this chapter is to formalize the intuitive notion of degree of unsolv- 
ability. Building on the concept of the oracle Turing machine, we will first define 
the concept of the Turing reduction, the most general reduction between computa- 
tional problems. We will then proceed in a natural way to the definition of Turing 
degree—the formal counterpart of the intuitive notion of degree of unsolvability. 


11.1 Turing Reduction 


Until now we have been using the generic symbol © to denote an oracle set. From 
now on we will be talking about particular oracle sets and denote them by the usual 
symbols, e.g., A,B. 

What does it mean when we say, for two sets A,6 CN, that “A is B-decidable 
in N”? By Definition 10.3 (p. 244), it means that the characteristic function 7 4 is 
B-computable on N. So, by Definition 10.2 (p. 243), there is a B-TM T® that can 
compute 7 4(x) for any x € N. (Recall that 7, is a total function by definition.) 
Since the oracle can answer the question n €?6 for any n € N, it makes the set B 
appear decidable in N in the ordinary sense (see Definition 6.10 on p. 142). 

We conclude: 


If A is B-decidable, then the decidability of B would imply the decidability of A. 


This relation between sets deserves a special name. Hence, the following definition. 
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Definition 11.1. (Turing Reduction) Let A,B C N be arbitrary sets. We say 
that A is Turing reducible (for short T-reducible) to B if A is B-decidable. 
We denote this by 

A <r B. (*) 


Thus, if A <7 B then if B were decidable also A would be decidable. The 
relation <,7 is called the Turing reduction (for short T7-reduction). 


If B is decidable (in the ordinary sense), then the oracle for 6 is no mystery. Such 
an oracle can be replaced by an ordinary decider Dg of the set B (see Sect. 6.3.3). 
Consequently, the B-TM T® is equivalent to an ordinary TM, in the sense that what 
one can compute the other can also compute. The construction of this TM is straight- 
forward: The TM must simulate the program 6 of T®, with the exception that when- 
ever TP asks x €?B, the TM must call Dg, submit x to it, and wait for its decision. 

The situation is quite different when 6 is undecidable (in the ordinary sense). 
In this case, the oracle for B is more powerful than any ordinary TM (because no 
ordinary TM can decide B). This makes T°? more powerful than any ordinary TM 
(because T® can decide B, simply by asking whether or not the input is in B). 
In particular, if we replace a decidable oracle set with an undecidable c.e. (semi- 
decidable) set, this may have “big” implications (see Problem 11.3). 


11.1.1 Turing Reduction of a Computational Problem 


Why have we named <7 a reduction? What is reduced here, and what does the re- 
ducing? Recall that each subset of N represents a decision problem (see Sect. 8.1.2). 
Thus, A is associated with the decision problem P = “x €?.A”, and B with the de- 
cision problem QO = “x €? 8”. The relation («) can now be interpreted as follows: 


If Q= “x €2?B” were decidable, then also P = “x €? A” would be decidable. 


Because of this we also use the sign <7 to relate the associated decision problems: 
P <7 Q. Cre) 


As regards the oracle for 8, we know what the oracle does: Since it can answer 
any question w €?B, it solves the problem Q. We can view it as a procedure—call 
it B—for solving the problem Q. But when BG is undecidable, we cannot know how 
the oracle finds the answers. In this case, B is a supernatural “algorithm”, whose 
operation cannot be described and understood by a human; a black box, as it were. 

Let us now look at 7%, the machine that computes the values of 74, and let 
5 be its o-TP. For & we know what it does and how it_does it, when it asks the 
oracle and how it uses its answers. Hence, we can view 5 as an ordinary algorithm 
A for solving P that can call the mysterious “algorithm” B. There is no limit on the 
number of calls as long as this number is finite (otherwise, A would not halt). 
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Considering all this, we can interpret the relation (**) as follows: 


If there were an algorithm B for solving Q, 
then there would be an algorithm A for solving P, 
where A could make finitely many calls to B. 


Since P would in principle be solved if Q were solved, we can focus on the problem 
O and on the design of the algorithm B. We say that we have reduced P to Q. If and 
when Q is solved (i.e., B designed), also the problem P is, in principle, solved (by 
A). Note that this means that the problem P is not more difficult to solve than the 
problem Q. This is an alternative interpretation of the relation (**). 

The interpretation described above is not bound to decision problems only. Take, 
for instance, ‘P? = EXISTENCE OF SHORTER EQUIVALENT PROGRAMS (Sect. 8.3.3), 
O = SHORTEST EQUIVALENT PROGRAM (Sect. 9.5), and assume there is an algo- 
rithm B for Q. Then, for any program p, the algorithm A (deciding whether or not 
there is a shorter equivalent program) calls B on p to obtain the shortest program q 
equivalent to p, compares their lengths, and answers YES or NO. So P <7 Q. Things 
change if we swap the two problems. Now the existence of B (deciding whether or 
not there is a shorter equivalent program) does not help us to design A for finding 
the shortest equivalent program g, because finding g even among the finitely many 
shorter programs is incomputable—and the oracle used by B cannot offer any help. 


NB From now on we will limit our discussion to set-membership problems. In doing 
so, we will develop the theory on sets of natural numbers. (Decision problems will 
be mentioned only to give another view of notions, concepts, and theorems.) 


11.1.2 Some Basic Properties of the Turing Reduction 


We now list some of the properties of the 7-reduction. First, we check that there 
indeed exist two sets A,B CN that are related by <r. To see this, consider the 
situation where A is an arbitrary decidable set. From Definition 11.1 it immediately 
follows that A <7 B for an arbitrary set 6. Hence the following theorem. 


Theorem 11.1. Let A be a decidable set. Then A <7 B for an arbitrary set B. 


But a set A need not be decidable to be T-reducible to some other set B. This 
will follow from the next simple theorem. 


Theorem 11.2. For every set S it holds that S <r S. 


Proof. If S were decidable, then (by Theorem 7.2, p. 156) also S = N—S would be decidable. 


Let S be undecidable in the ordinary sense. Then S is undecidable too (Sect. 8.2.2). 
Now Theorem 11.2 tells us that S <7 S, i.e., making S appear decidable (by using 
the oracle for S), also makes S appear decidable. Thus, a set A need not be decidable 
to be T-reducible to some other set B. 
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Is the T-reduction related to the m-reduction, which we defined in Sect. 9.2.2? If 
so, is one of them more “powerful” than the other? The answer is yes. The next two 
theorems tell us that the T7-reduction is a nontrivial generalization of the m-reduction. 


Theorem 11.3. Jf two sets are related by <j, then they are also related by <r. 


Proof. Let A <» B. Then there is a computable function r such that x € A <=> r(x) € B (see 
Sects. 9.2.1 and 9.2.2). Consequently, the characteristic functions 7 4 and 7x are related by the 
equation 7 4 = 7B or, where o denotes function composition. We now see: If the function 73 were 
computable, then the composition 7 or would also be computable (because r is computable, and 
a composition of computable functions is a computable function)—hence, A <r B. 


However, the converse is not true, as we explain in the next theorem. 


Theorem 11.4. Jf two sets are related by <r, they may not be related by <j. 


Proof. Theorem 11.2 states that S <7 S for any set S. Is the same true of the relation <,,.? The 
answer is no; there exist sets S such that S <,,, S does not hold. To see this, let S be an arbitrary 
undecidable c.e. set. (Such is the set KC; see Sect. 8.2.1.) The set S is not c.e. (otherwise S would 
be decidable by Theorem 7.3, p. 156). Now, if S <,, S held, then S would be c.e. (by Theorem 9.1, 
p. 213), which would be a contradiction. 

In particular, K <7 K, but K Z,, K. The same is true of Ko: Ko <7 Ko, but Ko Xm Ko. 


We can use the 7-reduction for proving the undecidability of sets as we use 
the m-reduction. But the 7-reduction must satisfy fewer conditions than the m- 
reduction (see Definition 9.1, p.211). This makes 7-reductions easier to construct 
than m-reductions. Indeed, Theorem 11.4 indicates that there are situations where 
T-reduction is possible while m-reduction is not. So, let us develop a method of 
proving the undecidability of sets that will use 7-reductions. We will closely fol- 
low the development of the method for m-reductions (see Sect. 9.2.3). First, from 
Definition 11.1 we obtain the following theorem. 


Theorem 11.5. For arbitrary sets A and B it holds: 


A <7 BA B is decidable => 4A is decidable 


The contraposition is: A is undecidable => A <r BV B is undecidable. Assum- 
ing that A <r B, and using this in the contraposition, we obtain the next corollary. 


Corollary 11.1. For arbitrary sets A and B it holds: 


A is undecidable \ A <7 B => B is undecidable 
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This reveals the following method for proving the undecidability of sets. 


Method. The undecidability of a set B can be proved as follows: 


Suppose: B is decidable; // Supposition. 
Select: an undecidable set A; 

Prove: A <7 b; 

Conclude: A is decidable; // 1 and 3 and Theorem 11.5. 
Contradiction between 2 and 4! 

Conclude: 6 is undecidable. 


ON ANE ee) 


Remark. | The method can easily be adapted to prove that a decision problem Q is undecidable. 
First, suppose that there is a decider B for QO. Then choose a known undecidable decision problem 
P and try to construct a decider A for P, where A can make calls to B. If we succeed, P is decid- 
able. Since this is a contradiction, we reject the supposition. So, Q is undecidable. 


Turing reduction is a relation that has another two important properties. 


Theorem 11.6. Turing reduction <r is a reflexive and transitive relation. 


Proof. (Reflexivity) This is trivial. Let S be an arbitrary set. If S were decidable, then, of course, 
the same S would be decidable. Hence, S <r S by Definition 11.1. (Transitivity) Let A,B,C be 
arbitrary sets and suppose that A <7 BAB <r C. So, if C were decidable, 6 would also be decid- 
able (because B <rz C), but the latter would then imply the decidability of A (because A <r B). 
Hence, the decidability of C would imply the decidability of A, ie., A <r C. 


Generally, a reflexive and transitive binary relation is called a preorder, and a set 
equipped with such a relation is said to be preordered by this relation. 

Given a preordered set, one of the first things to do is to check whether its pre- 
order qualifies for any of the more interesting orders (see Appendix A). Because 
these orders have additional properties, they reveal much more about their domains. 
Two such orders are the equivalence relation and the partial order. 

The above theorem tells us that <7 is a preorder on 2N_ Is it, perhaps, even an 
equivalence relation? To check this we must see whether <7 is a symmetric relation, 
i.e., whether A <r B implies B <r A, for arbitrary A,B. But we are already able 
to point at two sets for which the implication does not hold: These are the empty 
set @ and the diagonal set K (see Definition 8.6, p. 181). Namely, @ <r K (due to 
Theorem 11.1) while K £7 @ (because 0 is decidable and K undecidable). Thus we 
conclude: 


Turing reduction is not symmetric and, consequently, not an equivalence relation. 


Although this result is negative, we will use it in the next subsection. 
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11.2 Turing Degrees 


We are now ready to formalize the intuitive notion of degree of unsolvability. Its 
formal counterpart will be called Turing degree. The path to the definition of Turing 
degree will be a short one: First, we will use the relation <7 to define a new relation 
=r; then we will prove that =7 is an equivalence relation; and finally, we will define 
Turing degrees to be the equivalence classes of =r. 

We have proved in the previous subsection that <7 is not symmetric, because 
there exist sets A and B such that A <7 B and 6 £7 A. However, neither is <7 
asymmetric, because there do exist sets A and 6 for which A <7 6 and B <r A. 
For example, this is the case when both A and B are decidable. (This follows from 
Theorem 11.1.) Thus, for some pairs of sets the relation <7 is symmetric, while for 
others it is not. In this situation, we can define a new binary relation that will tell, 
for any two sets, whether or not <7 is symmetric for them. Here is the definition of 
the new relation. 


Definition 11.2. (Turing Equivalence) Let A,6 C N be arbitrary sets. We say that 
A is Turing-equivalent (for short T-equivalent) to B,if A<7 BA B <r A. 
We denote this by 

A Je B 


and read: If one of A,B were decidable, also the other would be decidable. 
The relation =7 is called the Turing equivalence (for short, T-equivalence). 


From the above definition it follows that A =r 6 for any decidable sets A,B. 
What about undecidable sets? Can two such sets be T-equivalent? The answer is 
yes; it will follow from the next theorem. 


Theorem 11.7. For every set S it holds that S =r S. 


Proof. Let S be an arbitrary set. Then S <7 S (by Theorem 11.2). Now focus on the set S. The 


same theorem tells us that S <7 S, ie., S <p S. So S=r S. Note that S can be undecidable. 
Alternatively: If one of the characteristic functions 7s, 75 were computable, the other would also 
be computable (because 75 = 1—Ys.) 


Calling =r an “equivalence relation” is justified. The reader should have no trou- 
ble in proving that the relation =7 is reflexive, transitive, and symmetric. Therefore, 
=r is an equivalence relation on 2, the power set of the set N. 

Now, being an equivalence relation, the relation =; partitions the set 2\ into 
=r-equivalence classes. Each =r-equivalence class contains as its elements all the 
subsets of N that are T-equivalent one to another. It will soon turn out that =,;- 
equivalence classes are one of the central notions of relativized computability. The 
next definition introduces their naming. 
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Definition 11.3. (Turing Degree) The Turing degree (for short T-degree) of aset S, 
denoted by deg(S), is the equivalence class {¥ € 2‘ |X =r S}. 


Given a set S, the T-degree deg(S) by definition contains as its elements all the 
sets that are T-equivalent to S. Since =7 is symmetric and transitive, these sets are 
also T-equivalent to one another. Thus, if one of them were decidable, all would 
be decidable. In other words, the question x €?.¥ (i.e., the membership problem) is 
equally (un)decidable for each of the sets 1 € deg(S). Informally, this means that 
the information about what is and what is not in one of them is equal to the corre- 
sponding information of any other set. In short, the sets V € deg(S) bear the same 
information about their contents. 


Remark. We now put deg(S) in the light of decision problems. Let 7, € deg(S) be any sets. 
Associated with V and Y are decision problems P = “x €?4”” and O = “y€?yY”, respectively. 
Since Y =r Y, we have P <r QO and O <r P. This means that if either of the problems P,Q 
were decidable, the other one would also be decidable. Thus, P and Q are equally (un)decidable. 
Now we see that the class of all decision problems whose languages are in the T-degree deg(S) 
represents a certain degree of unsolvability, which we are faced with when we try to solve any of 
these problems. Problems associated with deg(S) are equally (un)solvable, i.e., equally difficult. 


Based on this we declare the following formalization of the intuitive notion of 
the degree of unsolvability of decision problems. 


Formalization. The intuitive notion of the degree of unsolvability is formalized by 


“degree of unsolvability” — Turing degree 


Remark. Since the concept of “degree of unsolvability” is formalized by the T-degree, we will no 
longer distinguish between the two. We will no longer use quotation marks to distinguish between 
its intuitive and formal meaning. 


NB This formalization opens the door to a mathematical treatment of our intuitive, 
vague awareness that solvable and unsolvable problems differ in something that we 
intuitively called the degree of unsolvability. 


Intuitively, we expect that the degree of unsolvability of decidable sets differs from 
the degree of unsolvability of undecidable sets. So let us prove that indeed there are 
two different corresponding 7-degrees. 

First, let S be an arbitrary decidable set. Then deg(S) contains exactly all decid- 
able sets. As the empty set @ is decidable, we have 0 € deg(S), so deg(S) = deg(0). 
This is why we usually denote the class of all decidable sets by deg(Q). 

Second, we have seen (p. 255) that 0 <r K A K &r @. Therefore, @ #7 K and 
hence deg(0) 4 deg(K). 
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But deg(@) and deg(K’) are =7-equivalence classes, so they share no elements. 
We can now conclude as we expected to: deg(@) and deg(X.) represent two different 
degrees of unsolvability. We have proved the following theorem. 


Theorem 11.8. There exist at least two T-degrees: 


deg(0) = {XV |X =r Of, 
deg(K) ={X |X =r kK}. 


The Relation < 


It is natural to say that a decidable decision problem is “less difficult to solve” than 
an undecidable one. We will now formalize the intuitively understood relation of 
“being less difficult to solve.” To do this, we will introduce a new binary relation, 
denoted by <, which will be capable of expressing formally that deg(@) represents a 
degree of unsolvability that is “lower” than the degree of unsolvability represented 
by deg(X). The definition of < will be straightforward. Let us denote the irreflexive 
reduction of the relation <7 as usual by <7, i.e. A<pB <> A<7 BA A#rB. 
(Thus A<7B <=> A<7 BAB &r A.) Then the sought-for relation < is induced 
by the relation <7, as the following definition describes. 


Definition 11.4. (Relation <) Let deg(.A) and deg(B) be arbitrary T-degrees. 
Then deg(A) is lower than deg(B), denoted by deg(.A) < deg(B), if A <7 B. 


When deg(.A) < deg(B), we also say that deg(B) is higher than deg(A). 


Fig. 11.1 Turing degree 
deg(A) is lower than deg(B), deg(B) 
ie., deg(A) < deg(B). The 

relation < between the de- 

grees deg(A) and deg(B) is 

induced by the relation <r; 

between the representatives deg(.A) 
A and B 


Intuitively, a T-degree should not be lower than itself, because this would not 
agree with the intended meaning of the relation <. So we ask: Is the relation < 
irreflexive? The answer is yes, as is assured by the next, simple theorem. 
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Theorem 11.9. The relation < is irreflexive. 


Proof. Let S be an arbitrary set. Suppose that deg(S) < deg(S). Then, S <r S (by Defini- 
tion 11.4), i.e, S <r SAS #rz S (by definition of <7). Hence S #7 S. But this is a contradiction, 
because =7 is reflexive. 


Since 0 <7 K, we can now write the statement 


The degree of unsolvability of decidable decision problems is lower than the degree of 
unsolvability of undecidable decision problems T-equivalent to the Halting Problem 


in a formal way as 
deg(@) < deg(K). 


But, we have intuitively anticipated this! What is so special about the above formal 
statement? Isn’t this much ado about nothing? The answer is that the formalization 
that we have just made will enable us to discover in the next chapter a surprising 
fact that there are many other degrees of unsolvability. We will show this by the 
construction of T-degrees that differ from deg(@) and deg(KX). 
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A set A is Turing reducible to a set B, written A <7 B, if A is B-decidable. Thus, if 
A <r B then if B were decidable also A would be decidable. 

The 7-reduction <r is a generalization of <,,, the m-reduction; if two sets are 
related by <7, they may not be related by <,». 

A decidable set is T-reducible to any set. The complement of a set is T-reducible 
to the set. If a set A is 7-reducible to a decidable set, then A is decidable. If an 
undecidable set is T-reducible to a set 6, then 6 is undecidable. 

Correspondingly, a decision problem P can be T-reducible to a decision problem 
O, written P <7 O. If P <r Q then if Q were decidable also P would be decidable. 

If a decision problem P is T-reducible to a decidable decision problem, then P is 
decidable. If an undecidable decision problem is T-reducible to a decision problem 
QO, then Q is undecidable. 

Generally, a computational problem P can be T-reducible to a computational 
problem Q, written P <7 QO. If P <r Q then the computability of Q would imply 
the computability of P. 

If a computational problem P is T-reducible to a computable computational 
problem, then P is computable. If an incomputable computational problem is T- 
reducible to a computational problem Q, then Q is incomputable. 

The 7-reduction is reflexive and transitive, but not symmetric. 
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Two sets are said to be Turing-equivalent if each of them is T-reducible to the 
other. T-equivalence is an equivalence relation. 

A Turing degree of a set A, denoted by deg(A), is the class of all sets that are 
T-equivalent to A. T-degree is a formalization of the informal notion of “degree of 
unsolvability”. 

There are (at least) two T-degrees: deg(@) and deg(K-). The first corresponds to 
the degree of unsolvability of all decidable sets (i.e., decidable decision problems). 
The second corresponds to the degree of unsolvability of all undecidable sets that 
are T-equivalent to the set K, that is, the degree of unsolvability of all undecidable 
decision problems that are T-equivalent to the Halting Problem Dy. 

A T-degree deg(A) is said to be lower than a T-degree deg(B) if A is T-reducible 
to B, but not vice versa. 


Problems 


11.1. The relations <7 and C are, generally, independent of each other. That is to say, we can have 
A <r B simultaneously with any of the relations A C B and A 2 B, or neither of the two. 
Can you give examples of such sets A and B? 


11.2. Prove: 
(a) Aisac.e. set —> A<rK. 


(b) A,B are disjoint c.e. sets => A <r AUB and B <r AUB. 


11.3. Prove: There exist sets A and B such that A is not c.e., B is (undecidable) c.e., and A <r B. 
[Hint. Consider the Halting Problem and the non-Halting Problem.] 


Remark. Therefore, a c.e. oracle set 6 can make decidable even a non-c.e. set A. 


11.4. Prove: =r is an equivalence relation. 
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Chapter 12 Spats 
The Turing Hierarchy of Unsolvability 


A hierarchy is a system of organizing things into different ranks, 
levels, or positions, depending on how important they are. 


Abstract At this point we only know of two degrees of unsolvability: the 7-degree 
shared by all the decidable decision problems, and the 7-degree shared by all un- 
decidable decision problems that are T-equivalent to the Halting Problem. In this 
chapter we will prove that, surprisingly, for every undecidable decision problem 
there exists a more difficult decision problem. This will in effect mean that there is 
an infinite hierarchy of degrees of unsolvability and that there is no most difficult 
decision problem. 


12.1 The Perplexities of Unsolvability 


Turing degrees deg(@) and deg(X-) are the only T-degrees whose existence we have 
intuitively anticipated and formally proved. Are deg(0) and deg(X-) the only existing 
T-degrees? Put differently: Is every decision problem either decidable or exactly as 
difficult as the Halting Problem Dy? If the answer were no, then there would be 
a decision problem that would be undecidable because of some reason essentially 
different from the reason for which Dy is undecidable. If so, this would immediately 
raise the following questions: 


1. Are there undecidable decision problems that are more difficult than Dy? 
2. Are there undecidable decision problems that are less difficult than Dy? 


But, would all this make any sense? We could not compare the difficulties of 
undecidable problems just by comparing the times elapsed to obtain their solutions, 
as the computations could run indefinitely and return no solutions at all. For the 
same reason, neither could we use any other measure of the quality of the solutions. 
The situation we would face is illustrated in Fig. 12.1 (for computational problems). 

A way out of this situation is to use the Turing reduction. The idea is that for two 
undecidable decision problems P and Q, we consider Q to be more difficult than P 
if P <7 Q. Equivalently, Q is considered more difficult than P if deg(P) < deg(Q). 
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Becky, my problem is incomputable 
and more difficult than yours. 


Does that mean that you'll 
obtain no solution only after 
I don't get mine? 


Fig. 12.1 How much later than never can we obtain a solution to a more difficult incomputable 
problem? How much less than no solution can we get when solving a more difficult incomputable 
problem? Is there any sensible definition of the property “to be a more difficult incomputable 
problem” at all? The answer is yes; the way out of these perplexities is to use the Turing reduction 


We will now focus on question 1: Are there undecidable decision problems that 
are more difficult than Dy?! So, is there a problem Q such that Dy <7 Q? If so, can 
we construct it? The answer is yes; to construct such a Q we must use a mapping 
called the Turing jump operator. This is the subject of the next section. 


12.2 The Turing Jump 


Let S be an arbitrary set. The Turing jump operator is a mapping ': 2N —> 2 that 
assigns (i.e., constructs) to the set S another set, denoted by S’, whose T-degree is 
higher than deg(S). How does the mapping’ do this? 

First, recall the Halting Problem Dy: 


“Does T halt on input (T)?” 


The language of this problem is the set ; it contains the codes of all ordinary Turing 
machines T that halt on their own codes (7). Since (7) is just a binary-represented 
index x, and @, is the proper function of the ordinary Turing machine T,, we can 
rewrite K as follows: 


kK = {(T)|T halts on input (T)} 
= {x|T, halts on input x} 


= {x| x(x) L}. 


Secondly, let S be an arbitrary set. Let us wake up the oracle for S, make it avail- 
able to each o-TM 7%, and consider the obtained S-TM TS. Recall from Sect. 10.1.4 
that we can encode each TS with (7°) and interpret this as a natural number, the 
index of T°. Following the above definition of the Halting Problem, we now define 
the halting problem for oracle Turing machines TS: 


' We will answer question 2 in Chapter 13 (Theorem 13.5). 
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“Does TS halt on input (T°)?” 


Let us denote the language of this problem by KS. The set KS contains the codes 
of all S-TMs T° that halt on their own codes (T°). Again, each (TS) is a binary- 
represented index x of T°. Recalling from Sect. 10.2.1 that os is the proper func- 
tional of T°, we rewrite the set CS as follows: 


oe {7°} | TS halts on input (Foy 
= {x| T° halts on input x} 


= {2/5 (x) J}. 


So, given an arbitrary set S C N, we have constructed a new set Ke ON. 

Thirdly, we define ’ to be the mapping that sends a set S to the set KS. In plain 
words, the mapping ’ operates so that it “elevates” its argument S to an oracle set, 
i.e., makes S “jump” on the set KC. This is why ’ is called the Turing jump operator, 
and the set KS the Turing jump of the set S. We also denote K* by S’. Here is the 
official definition. 


Definition 12.1. (Turing Jump of a Set) The Turing jump of a set S is the set S’ 
defined by 
Si = KP = {x| P(x) 1}. 


12.2.1 Properties of the Turing Jump of a Set 


The main result of this subsection will be Corollary 12.1, which states that S and S’ 
are of different and comparable T-degrees. 

First, of course, both S and S’ are sets. But the oracle for S’ is more powerful than 
the oracle for S. Indeed, the following lemma tells us that if the oracle for S makes 
a set appear S-c.e., then the oracle for S’ makes the same set appear S’-decidable. 


Lemma 12.1. A is S-c.e. => A is S'-decidable. 


Proof, Let A be an arbitrary S-c.e. set. Define a binary functional ®S as follows: ®9(y,z) = 1 
if y € A, and 2 (y,z)t if y ¢ A. We will not need the actual value of x, but note that x is fixed. 
The argument z has no impact on ®®; it is there only because we want it to remain the only 
argument after the application of the Parameter Theorem. The functional ®$ is S-p.c. (as it is 
S-computable on A). Let us apply the Parameter Theorem and move y from ®® (y,z) to the index; 


hence ®$(y,z) = Cae y) (z), for an injective computable function s (see Sect. 7.3). Now observe 
that the following equivalences hold: y ¢ A <=> Pi.) (s(x,y)). <=> s(x,y) € KS. So we have 


y € A <> s(x,y) € S’, where s is a computable function and x fixed. This means that A is 
m-reducible to S’, i.e., A <p» S’. Then, A <7 S’ (by Theorem 11.3), and A is S’-decidable. 
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The next theorem states that every set S is S’-decidable. 
Theorem 12.1. Let S be an arbitrary set. Then S <r S’. 


Proof. Since S <r S holds for every S, the set S is S-decidable and, a fortiori, S-c.e. Then 
Lemma 12.1 (with A := S) tells us that S is S’-decidable; that is, S <7 S’. 


The converse is not true. The next theorem states that S’ is S-undecidable. How- 
ever, the theorem guarantees that the S-undecidability of S’ is not “excessive”; 
specifically, it tells us that S’ is S-c.e. In short, although there is no S-TM capable 
of deciding the set S’, there is an S-TM capable of recognizing S’. 


Theorem 12.2. Let S be an arbitrary set. Then: 


a) S' is S-undecidable (i.e, S' £7 S). 
b) S' is S-c.e. 


Proof. The proof runs along the same lines as the proof of Lemma 8.1 (see Sect. 8.2) that K is 
undecidable. The main difference is that now we will be talking of S-TMs (instead of ordinary 
TMs) and of the set KS (instead of KC). We therefore move at a somewhat faster pace. 


a) Suppose that S! <7 S. Then the set S’ = {(TS) | 7S halts on input (TS)} is S-decidable, and 
there is an S-TM capable of deciding the question “Does 7% halt on input (7°)?” for any T°. Let 
us denote this hypothetical decider by Dz. 

Now we construct a new S-TM that all use Dg. Since it will call an S-TM, it will itself be 
an S-TM. So let us denote it by N°. The input to NV’ NS will be the code (7) of an arbitrary TS. 
The machine N° must operate as follows. First, it doubles the input (T°) into (T°, T°), and then 
sends this to Dg. The decider takes this as the question (T5,T°)€?K°%, eventually halts, and 
answers either YES or NO. If the answer is YES, then NS calls De. again with the same question; 
otherwise, NS outputs its own answer YES and halts. 

But there is a catch: If N© is given as input its own code (NS), it puts the De. in trouble. 
Namely, if Dg. has answered the first question (V°,N°)€2KCS with YEs, then N° starts endless 
cycling, drag which De. stubbornly repeats that N° will halt. If, however, De. has answered 
(NS NS )E2KS with No, and so predicted that N° would not halt, N° halts in the. very next step. 

In short, it is not true that Dg. correctly answers any question (T%, as 2K. Actually, it fails 
when TS := N®. This eonileadicts our supposition. Consequently, S’ <7 S 


b) Let Re be an S-TM as follows. The input to Re is the code (T°) of an arbitrary T°. Re 


starts dinulaene TS on (TS), and if it halts (i.e., T° would halt on (7%)), then it outputs YES. 
So R& is a recognizer of the set {(7°)|T% halts on input (T>)} = S’. Hence, S’ is S-c.e. 


From Theorems 12.1 and 12.2a it follows that S <7 S’, for any set S. Therefore, 
S and S’ are not =r-equivalent, but still comparable T-degrees. 


Corollary 12.1. Let S be an arbitrary set. Then: deg(S) < deg(S’). 


By taking S := K in the above corollary, we obtain deg(C) < deg(K’). 


NB We have discovered that there is a T-degree that is higher than deg(K). Hence, 
there exist decision problems that are more difficult than the Halting Problem. 
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12.3 Hierarchies of 7-Degrees 


Because S’ is a set, we can apply the function ’ to S’ too. This leads to an even 
higher T-degree deg((S’)’). Since we can repeat this as many times as we wish, it 
follows that there are higher and higher <-comparable 7-degrees—and this never 
ends. We conclude that there is a hierarchy of at least No T-degrees and that there is 
no highest T-degree. Let us now see the details. 

In the same fashion as we constructed the Turing jump of the set S, we can 
construct the Turing jump of the set S’: We take the oracle for S'‘(= K°), make 
it available to each o-TM 1%, define for the obtained S-TM TS (= TK*) the cor- 
responding halting problem, and finally obtain the associated language (set) Ks, 
Then, we can rewrite KS as follows: 


Ks re (Rey (S'y. 


We call this set the second Turing jump of S and denote it simply by S" or S2). 
In the same manner we define the sets SB3)S ee In general, we construct 
Stl) by “elevating” S () to the oracle set, making it available to all o-TMs, and 


collecting in S“+") the codes (TS " of those S(-TMs that halt on their own codes: 


Sl) = fe" | TS” halts on input Ge 
= {x| TS” halts on input x } 


= {x| 6S" (x) J}. 


Definition 12.2. (nth Turing Jump) The nth Turing jump of the set S is the 
set S), which is inductively defined as follows: 


(SOs) ate 


The relation between S( and S(+!) is described by the following theorem. The 
proof of the theorem would run in the same way as the proofs of Lemma 12.1, 
Theorems 12.1 and 12.2, and Corollary 12.1. We therefore leave it as an exercise to 
the reader. 


Theorem 12.3. Let S be an arbitrary set. Then: 
a) si”) ee Sih 
b) S@+) is S-c.e. 
c) deg(S) < deg(S("*)) 
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We see that each set S is the origin of an infinite hierarchy of sets: 
SO <p SY <p SO cp. <p SO <p SOY cp... 
Associated with this hierarchy is the infinite hierarchy of T-degrees: 
deg (S() < deg (S(!)) < deg (S()) eee deg (S) < deg(S(+))) Roasts 


There are at least No T-degrees that are comparable with the relation <. Why 
have we said at least? The only reason is because we are cautious: At this point, 
nothing excludes the possibility of the existence of T-degrees between deg(S () 
and deg (S ee) for some i. Such T-degrees would by passed over by the T-jump 
and hence not constructible by it. 

We have thus discovered one more surprising fact: 


NB For every degree of unsolvability there is a higher degree of unsolvability. For 
every decision problem, even an undecidable one, there is a more difficult decision 
problem. There is no most difficult decision problem. 


12.3.1 The Jump Hierarchy 


Until now, S denoted an arbitrary subset of N. In this subsection we will choose for 
S a particular set. 

To do this, we intuitively reason as follows. The power of the oracle for S will 
depend on the (un)decidability of the chosen set. If we choose for S a decidable set, 
then we expect that the oracle for S will help o-TMs as little as possible. Actually, 
an 0-TM with such an oracle will be of equivalent power to an ordinary TM (see 
Sect 11.1). We therefore expect that the difference between two successive Turing 
jumps of the set S, i.e., the sets S and S(+!), will be as small as possible. This 
should result in a denser hierarchy deg (S (0), i=0,1,2,..., that would, hopefully, 
reveal all? T-degrees and hence more of their properties. 

So, for S we will pick a decidable set. As decidable sets are T-equivalent, and 
the set @ is decidable, we will take S := 0. We obtain the following jump hierarchy 
of sets 

wd) <T 0) <T 9) ee SSP y) <T gt) ey ae 


and the associated jump hierarchy of T-degrees 


deg (0) < deg (0) < deg (0°?) aera deg (0) < deg (0+) <i... 


? However, we will learn in Chap. 13 that this reasoning is too optimistic. It will prove again that 
intuition can be misleading. 
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Example 12.1. (Degrees of Some Sets) The undecidable sets Ko, K1, Fin, Cof, Tot, Ext, which we 
defined in Sects. 8.2 and 8.3.5, belong to the initial 7-degrees of the jump hierarchy. In particular, 


K,Ko,K1 € deg(0) 
Fin, Tot € deg (0) 
Cof, Ext € deg(0°)) 
This tells us that 


e the following three incomputable problems are equally difficult: 


Du = “Does T halt on input (T)?” 
Duatt = “Does T halt on input w?” 
Dx, = “Is dom(@) empty?” 


e the next two decision problems are equally difficult, yet more difficult than the above three: 


DFin 
Dor 


“Ts dom(@) finite?” 
“Is @ total?” 


e the next two decision problems are equally difficult, but more difficult than the above five: 


Deof = “Is 9 undefined on finitely many elements?” 
Deut 


“Can @ be extended to a (total) computable function?” 


Written succinctly: 


Duy =r Puat =r De, <r DFin=7r Pta <r Peop=r Dext 
This situation is depicted in Fig. 12.2. 


deg(@) 


deg(o) 


Fig. 12.2 The undecidability 

of some decision problems. 

The problems in the same deg(@") 
T-degree are equally difficult, 

but more difficult than the 

problems in lower T-degrees 


If, for instance, D7,; were decidable, D¢j, and Dy, Dyan, Px, would also be decidable; however, 
Deo and Dex; would remain undecidable. oO 
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12.4 Chapter Summary 


We compare the difficulty of computational problems by relating the problems using 
the Turing reduction. 

Using the Turing jump operator, we can construct, for an arbitrary set, a set that 
is at a higher 7-degree than the original set. 

Correspondingly, for every decision problem, even an undecidable one, there is 
a more difficult decision problem. 

This enables us to construct the jump hierarchy of T-degrees. The hierarchy starts 
with the T-degree deg(@) representing decidable decision problems, and continues 
with infinitely many 7-degrees. There is no highest T-degree in the jump hierarchy. 
At this point nothing suggests that there might exist 7-degrees other than those that 
are members of the jump hierarchy, i.e., constructible by the Turing jump operator. 

This means that the class of decision problems does not divide into just two 
subclasses, deg(@) and deg(K’), one consisting of decidable and the other of unde- 
cidable problems, which are equally difficult as the Halting Problem. Instead, the 
subclass of undecidable decision problems is further partitioned into infinitely many 
subclasses consisting of more and more difficult problems. 

All of this is surprising and proves once again that human experience and intu- 
ition can be deceptive. 


Problems 

12.1. Prove: 
(a) 0 =K; 
(b) 0" = KX; 
(c) Q” = KKK : 


[Hint. Use S = 0 in Definitions 12.1 and 12.2.] 
12.2. Prove Theorem 12.3. 
12.3. Prove: A is B-c.e. == A < B’. 


12.4. Prove: A is B-c.e. \ B <7 C => AisC-c.e. 


Remark. Thus, in Theorem 11.6, the conditions for the transitivity of <; can be relaxed. 
12.5. Prove: A <7 B => A’ <, B’. 


12.6. Prove: A= B= A’ =, B’. 


Remark. A similar implication will be proved in the next chapter (see Theorem 13.1, p. 274). 


12.7. Prove: A is B-c.e. —> A is B-c.e. 
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@ 
Chapter 13 a 
The Class D of Degrees of Unsolvability 


A structure is something that consists of parts connected 
together in an ordered way. 


Abstract We now know that there are 1) infinitely many T-degrees; 2) the relation 
< defined between these T-degrees; and 3) the operator Turing jump that constructs 
from a given set a new set at a higher 7-degree. In this chapter, T-degrees will be- 
come the main object of our research. It will be useful to view T-degrees as members 
of a certain class, D. We will define this class as a mathematical structure endowed 
with a relation and a function that we will found on < and’, respectively. This view 
will simplify our expression and the investigation of the properties of the structure. 


13.1 The Structure (D, <, ’) 


Recall that a mathematical structure is a class endowed with certain relations and 
functions defined on the class (see the footnote on p. 40). In what follows, our inten- 
tion is to define a structure whose class will contain all T-degrees, while the relation 
and function on this class will be based on the relation < and the Turing jump oper- 
ator ’, respectively. 

First, observe that instead of viewing a T-degree as a =7-equivalence class of 
subsets of N, we can view it as a member of some other set. Of course, this is the 
quotient set of 2N relative to =7, 1.¢., the class at) =r of all =;-equivalence classes 
of 2‘. We will denote this class by D. 


Definition 13.1. (Class D) The class D of all T-degrees is D = 2N/=r. 


D is not empty; we have seen in Chap. 12 that it has infinitely many members. 


Remarks. (1) We can interpret D as the class of all degrees of unsolvability of decision problems. 
(2) It is customary to denote the members of D by boldface characters, e.g., a, b, c, d, or 0. 
This notation indicates no representatives of T-degrees. When we will need a representative of a 
T-degree, we will indicate it explicitly, e.g., 5 ¢d or d=deg(S), saying that S is of degree d. 
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Second, we already have a relation defined on D; this is the relation <, which 
we introduced to compare T-degrees deg(@) and deg(X). It turned out that there are 
infinitely many T-degrees deg(S (), i € N, and that they are linearly ordered by <. 
Now recall that linear order is just a special case of partial order (see Appendix A). 
We want to be able to consider other partial orders on D, if any. For this reason we 
will “relax” the relation < by replacing it with its reflexive closure <, defined as 
usual bya <b “+a<bva=b. 

Finally, we also have a function ’, the Turing jump. But we must be careful before 
we apply it to the members of D. Why? We have defined ' on sets (Definition 12.1, 
p. 265), and not on T-degrees, so it is not so obvious that D can inherit ’ as it 
inherited the relation <. What we must clear up is the following question: If sets 
A and B are in the same T-degree, can A’ and 8’ be in different T-degrees? If this 
could happen, then ' would not be well defined on D. Luckily, the answer to the 
question is no. Thus the following theorem. 


Theorem 13.1. A=, B= A'=, B' 


Proof. First, we prove that A <7 B => A’ <r B’. So let A <r B. It follows that A’ is A-c.e. (by 
Theorem 12.2 b). But then A’ is B-c.e. Hence, A’ <7 B’ (by Lemma 12.1). We have thus proved 
A <r B=> A' <r B’. Second, we prove that B <7 A ==> B! <7 A’. This is easy: We only have 
to swap A and B in the previous proof. Thus, B <7 A => B’ <r A’. Finally, the two proved 
relations together result in. A =r; B => A’ =r B’. 


Informally, this means that T-jumps of sets of the same 7-degree are sets of the same 
T-degree. Consequently, we can extend the definition of ’ to be a function that maps 
a T-degree into a (single) T-degree. Thus we can talk about the Turing jump of a 
whole T-degree. In short, ’ is a well-defined function on the class D. Here is the 
definition. 


Definition 13.2. (J-jump of a T-degree) The Turing jump of a T-degree 
d€ Dis the T-degree d’ = deg(S'), where S is an arbitrary member of d. 


The nth T-jump d ”) of a T-degree d is defined in the same fashion as for sets 
(see Definition 12.2, p. 267), so we omit the formal definition. Writing 0 instead 
of deg (0), the jump hierarchy of T-degrees is now 


0 <a) <0 <...< 08 <0) <... 


The initial 7-degrees we usually denote by 0 = 0, 0 =0, 0” =0%, 0” =0°). 


To summarize, the class D has been endowed with the relation < and the function ’. 
In the following sections we will list some properties of the structure (D,<,’). 
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13.2 Some Basic Properties of (D, <, ’) 


In this section we will list some of the properties that concern (D,<,') as a whole. 
Specifically, we will explore facts about the cardinality and structure of (D,<,’). 


13.2.1 Cardinality of Degrees and of the Class D 


Concerning the cardinality, two questions are of interest: 


1. Cardinality of T-degrees. Given any T-degree, how many sets are there in it? 
2. Cardinality of the class D. How many T-degrees are there in‘D? 


Remark. Restating the above questions in view of decision problems we obtain: 
1) How many decision problems share a given degree of unsolvability? 
2) How many degrees of unsolvability are there? 


The first question is answered by the following theorem. 


Theorem 13.2. Every T-degree is countable. 


Recall that a set is countable if it is equinumerous to a subset of N. Thus, the cardinality of a 
countable set is either a natural number or No. 


Proof. Let B C N be an arbitrary set. Define Icone(B) to be the set of all sets T-reducible to B; 


that is, Icone(B) © {.A|.A <r B}. The plan of the proof is this: We will prove that lcone(B) is 
countable; since deg(B) C Icone(B), it will then follow that deg(B) is countable too. 

To prove that Icone(B) is countable, we must construct an injection f : lcone(B) > N. How 
can we do that? Since A <7 B, there is a B-TM T® capable of computing 7.4 (a) for any a € A. 
So there are countably infinitely many B-TMs equivalent to T%, in the sense that each of them 
computes 74 (see Sect. 10.1.4). The set ind? (A) of all indexes of such machines is countably 
infinite (see Sect. 10.2.1). We now define the function f : lcone(B) — N by letting f assign to A 


the smallest index in ind (A); that is, f(A) = min{x| 7,8 computes 7.4}. (Such an index exists, as 


ind® (.A) 4 0.) The function f is injective. (Otherwise, there would exist in lcone(B) two different 


sets A;, A that would be decided by the same B-TM Th a) = Ti Aa This would imply that 


XA, = Ay» Which is impossible because A; # Az.) Consequently, Icone(B) is countable. As 
deg(B) C Icone(B), so is deg(B). 


But we can do more: Kleene and Post proved that each T-degree has cardinality No. 


Proposition 13.1. Every T-degree is countably infinite. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 
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We now turn to the second question. How many 7-degrees exist in D? We have 
seen that the jump hierarchy 0 <0) <0% <...<0 <0 <... contains 
No T-degrees. So, D contains at least as many T-degrees. What about other hier- 
archies starting in decidable sets S besides 0? Since all decidable sets are in 0, all 
their Turing jumps are in 0’ (by Theorem 13.1). Thus, it seems that there can be no 
other hierarchy besides the jump hierarchy. On the other hand, nothing says that the 
Turing jump is the only way to discover new 7-degrees. If other ways exist, also 
other degrees of unsolvability may exist. So the question is whether or not the total 
number of degrees of unsolvability is larger than No. In other words: Is the class D 
countable or uncountable? Here is the answer. 


Theorem 13.3. The class D is uncountable; its cardinality is 280. 


Proof. There are 2¥0 subsets of N. Since each is of a certain degree of unsolvability, there can be 
at most 20 T-degrees in D. Now, if there were only countably many (i.e., at most No) T-degrees 
in D, then—knowing (by Theorem 13.2) that each contains countably many sets—there would be 
countably many sets contained in all T-degrees. (Note that “countably many x countably many = 
countably many”; see Appendix A.) We conclude that there must be 280 T-degrees in D. 


Remarks. Let us interpret the above results in the world of decision problems. There are 2%0 
degrees of unsolvability (Theorem 13.3). (This is as many as there are real numbers, assuming the 
Continuum Hypothesis, p. 16.) For each degree of unsolvability there are No decision problems of 
that degree of unsolvability (Theorem 13.2). For instance, Xo decision problems are as difficult as 
the Halting Problem (see Fig. 12.2). There are No decision problems as difficult as the problem 
Dro = “Isa p.c. function @ total?” A similar situation occurs for problems that are as difficult as 
the problem Dg, = “Can a p.c. function @ be extended to a computable one?” 


Now we see that our cautiousness in Sect. 12.3 was well grounded: Besides the No 
T-degrees 0 < 0) < 02) <...< 0 <0" < ... there are many more T-degrees. 
But where are they? Intuitively, we can identify two possibilities: 


e there exist 7-degrees that are not within the jump hierarchy; 
e there exist intermediate T-degrees between a 7-degree and its T-jump. 


In the following, we will see that both are true. 


13.2.2 The Class D as a Mathematical Structure 


The class D is endowed with the relation <. Does this relation reveal any particular, 
distinguished order in the class D? It is easy to prove the following theorem. 
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Theorem 13.4. (D,<) is partially ordered. 


Proof. We leave it to the reader to check that < is reflexive (a < a, for any a € D), transitive 
(a<bAb<c=>a<e, for any a,b,c € D), and anti-symmetric (a <bAb <a=>a=b, for any 
a,beED). oO 


Incomparable T-Degrees 


Is it, perhaps, that (D, <) is even linearly ordered, as is, for example, (N,<)? Since 
(D, <) is partially ordered, to answer this question we must find out whether or not 
every two members of D are <-comparable, i.e., whethera <b V b <a, for every 
a,b © D. Surprisingly, the answer is no. In 1954, Kleene and Post proved that there 
exist sets A,B, both T-reducible to the set 0’, such that A <7 B A B &r A. Since 
A and B are <;-incomparable, so are <-incomparable deg(A) and deg(B). (Note 
that deg(A) < 0’ and deg(B) < 0’.) We write a|b when a,b are <-incomparable. 
The proof of the following Kleene-Post theorem is instructive because its idea will 
soon be developed further and used in Post’s Problem (see Chap. 14). 


Theorem 13.5. There exist T-degrees a,b such that 0 < a,b and a,b < 0' and a\b. 


Proof. To prove the theorem, we must show that there exist two sets A and B such that A <r 0’, 
B<7W,andA <7 BAB £7 A. Since we expect that proving the existence of A and B in one fell 
swoop might be too difficult (if not impossible), we will take a different approach: We will design 
a set of guidelines by which A and 6 can be constructed, at least in principle, in a systematic, 
algorithmic way. There are several ingredients in this method. 

First, observe that the condition A <7 6 means that no B-TM can decide the set A. Hence 
the condition can be replaced by a conjunction Ro \ Rj \ Ro A... of countably many simpler re- 
quirements, where R, requires that A cannot be decided by 7.2, the eth B-TM. Equivalently, R, 
demands that 74, the characteristic function of A, not be the proper functional @B : 


Rei HA OP. 
In the same fashion we replace the condition B <7 A with the sequence So \.S; A S2/..., where 
Se: XB F oA. 


Consequently, to prove the theorem we must show how to construct A and B such that R, and S. 
will be fulfilled for every e. 

The plan is this: We will construct 7 4 and Yp, in an infinite sequence of stages; at any stage 
s, only the current approximations f, and g; to 7.4 and 7, respectively, will exist; we will ensure 
that the current f; and g, fulfill Ro \...\ Re and SoA... A Se, respectively, for some ¢ = £(s); and 
we will ensure that the length ¢(s) of the fulfilled conjunctions will increase monotonically from 
stage to stage. As a consequence, in the limit, the condition A <r B A B &r A will be fulfilled. 

At any stage s, the to-be-constructed sets A and B will be approximated by sets denoted by A; 
and ,, respectively. Correspondingly, ¥ 4 and 7 will be approximated by f, and g,, respectively. 
So, in the limit, we expect Aw = A, By = B and fo = XA, 8m = Xp. The construction of y% 
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and 7g will start at stage s = 0 with fo and go representing Ap = By =9. At each next stage, we 
will try to fulfill first the requirement R,, and then the requirement S,, for some e. (Thus, at s = | 
we will fulfill Ro and So; at s = 2, R, and S;; and so on.) But we will try to do this in such a 
way that once a requirement has been fulfilled, it will remain fulfilled forever. (We also say that 
no requirement will be injured.) If we succeed in this plan, the requirements will become fulfilled 
in the order Ro, So,R1,51,R2,S2,..., which will satisfy the condition A <7 B \ B &r A. (What 
about the other two conditions, A <7 @', B <r 0’, that are set by the theorem? The reasons for 
having these will become apparent soon.) 

We have come to the question about how to fulfill R, and S,, given that all previous Rs and Ss 
have been fulfilled. The answer will be given in an induction-like way by outlining what the next 
stage s+ 1 should do in order to preserve the situation similar to that at stage s. So, assume that, at 
stage s, we have constructed f; and gs in such way that, for some n, both are defined everywhere on 
the initial segment {0,1,...,n} of N, and all the requirements Ro, So,...,Re—1,Se—1 are fulfilled. 
We can represent the function f; by a sequence of its values on the segment, that is, by the word 
ds = f;(0) fs(1)... fs(m) € {0, 1}*. Of course, f,(i) tells us whether or not i€ A, (when 0 <i<n), 
while f,(i)? for i > n, as the status (membership) of these numbers in A is still open. (The same 
holds for g,, which is represented by a word b, € {0, 1}*.) Then, in the next stage s +1, we will try 
to define f;+1 and gs41 in such way that 


@ ds Cas, and b, C by+, will hold. That is, a,;; is to be a proper extension of the word ag, or, 
equivalently, as is to be a proper prefix of the word as+1. (Similarly for bs+1.) This means that 
fs+1 is to be defined everywhere on {0,1,...,7,...,m}, for some m > n. (Similarly for g41.) 

e R. and S, will be fulfilled and none of Ro, So,...,Re—1,Se—1 will be injured. 


If we attain the two objectives, the limit functions f@,g@ : N — {0,1} will be total, and hence 
characteristic functions of the sets Aw =A and By = B. These sets will fulfill all requirements R 
and S. 


Clearly, attaining the above objectives is the crux of the proof. We will prove the following lemma. 


Lemma. Let e be an arbitrary natural number. Given ds, bs, there exist extensions ds+1, bs41 such 
that for any a, b that extend ds, bs+1 and represent ¥ 4, Xp, the B-TM TB does not decide A. 


Proof. Let x be an arbitrary natural number for which f, is not defined: f,(x)t. (So x is a number 
whose membership in A,, and hence in A, has not been determined.) The crucial question is: 


Is there a set B such that TB would halt on input x and return either YES or NO? (*) 


(i) If such a set B does not exist, then we can take the trivial extensions as) := ds and bs+1 := Ds. 
(ii) If, however, such a G exists, there is more work to do in order to construct as; and bs+1. 
First, we run 7? on x. Before halting, the machine asks the oracle finitely often whether or not 
a number is in B. Let y be the largest of these numbers. Then, let b,,1 be the shortest extension 
of bs that covers y. Now it remains to construct a,+;. Note that TB , if run on x, would return the 
same answer as TBs, So, to ensure that TB (and hence TB ) will fail to answer correctly the 
question x €?.A;,1 (and hence the question x €?.A), we add x into either As, or its complement 
Assit depending on whether TBs *s answer is NO or YES, respectively. As a consequence, the 
limit machine TF will fail to decide the limit set A. 

It should be obvious that, once it has been determined which of the possibilities (7,ii) holds, 
the construction of a,,; and b,;; is computable in the ordinary sense. But how can we answer 
the question (*)? A systematic search for B is out of question. But we can obtain the answer by 
focusing on the program & of the o-TM T,*. Each instruction of 5 branches in two directions. 
This results in the computation tree of &, a tree representing all possible executions of &. Given a 
particular oracle set GB, the actual execution of 6, on x is represented by a branch in this tree. Halting 
executions are represented by finite branches, and non-halting executions by infinite branches. 
Answers (YES, NO) are in the leaves of the tree (i.e., at the end of finite branches). We now see: 
There exists a set B for which 7.3 halts on x and returns a YES or NO iff the computation tree of 6, 
has a leaf bearing the answer YES or NO. So we can apply an ordinary TM T that systematically 
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checks the leaves of the computation tree of 6, and halts with an answer found as soon as a leaf with 
an answer YES or NO has been found. In this case we know that a set 6 with the above properties 
exists. But what if T never halts? So, we must find out whether or not 7 halts. Recall that this is 
the Halting Problem, which can be solved by an o-TM with the oracle set Ko. Consequently, the 
question (*) is Kg-decidable, and the construction of a,;; and b,,; is Ko-computable. (This is 
where the condition B <r @’ of the theorem comes from.) The lemma is proven. 


Observe that we can exchange the roles of A and B in the lemma and prove that as, bs can be 
extended so that ultimately the A-TM 7/4 will not decide B. Hence, applying the lemma twice, 
we can fulfill both requirements R, and S,. Now we can finally see the whole process of the 
construction of A and GB. We consider oracle Turing machines in succession, Ty ,7;*,75,..., and, 
based on the above lemma, ensure that none of them decides A. In doing so, we alternate, for each 
o-TM T,*, the roles of the sets A and B and in this way ensure that the construction of both A and 
B proceeds. 


So, there are incomparable 7-degrees between 0 and 0’. Is this situation specific to 
the pair 0, 0’? Far from it: <-incomparable T-degrees exist between d and d’, for 
any T-degree d. Here is a generalization of the previous theorem. (See Fig. 13.1.) 


Theorem 13.6. For any T-degree d there exist T-degrees a,b such that d < a,b 
and a,b <d’ anda\b. 


Proof. The proof is a relativization of the above proof. See the Bibliographic Notes to this chapter. 


D/ 
he? 


d 


Fig. 13.1 For any T-degree 0/ 
d there exist <-incomparable \ A 


T-degrees a,b 


Theorems 13.3 and 13.6 suggest that there might be uncountably many <- 
incomparable pairs of 7-degrees. That this is indeed so is stated in the next theorem. 


Theorem 13.7. There are 28° mutually <-incomparable T-degrees. 


Proof. See the Bibliographic Notes to this chapter. 
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Distinguished T-Degrees 


Since there are <-incomparable elements in D, the relation < does not linearly 
order D. This gives rise to a series of new questions about the existence of certain 
distinguished elements: 


1. Are there <-minimal, <-least, <-greatest, or <-maximal elements in D? 
2. Do every two members of D have a <-upper bound or even a <-lub?! 

3. Do every two members of D have a <-lower bound or even a <-glb?* 

4. Is (D,<) a lattice, and if so, of what kind? 


Let us answer these questions. 

There can be no <-greatest and no <-maximal element, because the Turing jump 
constructs, for arbitrary T-degree d, a higher T-degree d’. But, there is a <-least 
element in (D, <). 


Theorem 13.8. There is a <-least T-degree in (D,<); this is 0. 


Proof. Since @ is decidable, @ <r A for every set A. Hence, 0 <a for everya ec D. 


This is not surprising. The decidability of a decidable set is insensitive to the decid- 
ability of other sets. Thus, the degrees of true unsolvability are in ‘D — {0}. 

Is (D, <) a lattice? Well, to be a lattice, any two T-degrees a,b must have a <-lub 
and a <-glb. The first requirement is satisfied, as the following theorem states. 


Theorem 13.9. Any two T-degrees have a <-least upper bound. 


Proof. Let a = deg(A) and b = deg(B) be arbitrary T-degrees. Consider the join A @® B of A and 
B, that is, the set A@ B © {2x|x € A}U{2y+1]|y € B}. The members of A and B are injectively 
mapped into even and odd members of the join, respectively. Informally, A 6 remembers the 
origin of each of its members. A @ B is the <7-lub of A and B. To prove this, we must check that 
1) A<r A@BandB <r AGB, and 2) ASB <r C, for any <r-upper bound C of A,B. (This we 
leave as an exercise.) It follows that deg(.A @ B) is the <-lub of a = deg(.A) and b = deg(B). 


We denote the <-least upper bound of a,b € D by aV BD. In particular, if a = deg(A) 
and b = deg(B), then aV b = deg(A@ B). Informally, the set A@6 remembers 
every member of A and every member of B by keeping track of the origin of each 
of its members. Thus, each representative of a Vb bears more information about 
its contents than any representative of a or b. Finally, of course, any finite set of 
T-degrees has a <-least upper bound. 


In contrast to the above theorem, the <-glb of two degrees need not always exist. 
This was first proved by Kleene and Post in 1954. 


' least upper bound 
? greatest lower bound 
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Theorem 13.10. There is a pair of T-degrees that have no <-greatest lower bound. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


We must conclude that (D, <) is not a lattice. However, because of the existence 
of least upper bounds, we say that (D, <) is an upper semi-lattice. 


Remarks. Let us look at the above results in the world of decision problems. There are pairs of 
decision problems such that neither is more difficult than the other (Theorem 13.6). For any finite 
set of undecidable decision problems there is a “superproblem” (i.e., a decision problem whose 
solution would make all problems in the set decidable) which is the easiest among all “superprob- 
lems” of the set (Theorem 13.9). 


13.2.3 Intermediate T-Degrees 


Can there be intermediate degrees between 0 and 0+), that is, degrees that are 
passed over when 7-jump takes 0 to 01)? The answer to this question has al- 
ready been obtained by Theorems 13.5 and 13.6. But we can ask further: How many 
T-degress can be passed over by a T-jump? The following theorem, which was also 
proved by Kleene and Post in 1954, tells us that for each degree d there are infinitely 


many pairwise <-incomparable degrees between d and d’. (See Fig. 13.2.) 


Theorem 13.11. For any T-degree d and n> 1, there are pairwise <-incomparable 
T-degrees C1,...,C€, such that d <c, <d', fork =1,...,n. 


d ' 
Cr Cr t D / 
Fig. 13.2 For any d, there : é 0” 
are infinitely many pairwise \ y, 
<-incomparable T-degrees 0/ 
cx that are passed over by / 
T-jump from d to d' ‘0 


Proof. See the Bibliographic Notes to this chapter. 
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The above theorem guarantees the existence of <-incomparable T-degrees be- 
tween d and d’. Can it happen that, for some d, there are T-degrees between d and 
d’ that are comparable or even linearly ordered by <? If so, can such a sequence 
of intermediate 7-degrees have infinitely many members? The answer to both ques- 
tions is yes. (See Fig. 13.3.) 


Theorem 13.12. For any T-degree d and n > |, there are T-degrees c,,...,€y such 
tat G! KC) KX oon K On Kl’. 


d' / 
D/ 
Cn / 
C2 
Ci : 
Fig. 13.3 For any d, there \d are 
are infinitely many linearly / ; 
ordered T-degrees ¢; that are x 0/ 
passed over by 7-jump from d \ | / 
tod’ ‘0 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


Moreover, Kleene and Post proved that linearly ordered intermediate 7-degrees are 
dense, in the sense that if a and b are any such 7-degrees, then there is a T-degree c 
between them. This is stated in the next theorem. 


Theorem 13.13. [fd <a <b <d’, then there is a T-degree c such thata <c <b. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


13.2.4 Cones 


It may seem that because of Theorems 13.3 and 13.7 there is not much left for 
<-comparable elements in D. But infinite sets contain equipollent proper subsets 
(see Appendix), so there still can be many elements of D that are <-comparable. 
This gives rise to the question: How many members of D are <-comparable to a 
givend €« D? 
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Let us pick an arbitrary d € D and consider all the elements of D that are 
<-comparable to d. We divide these elements into two sets, the set of elements 
that are higher than (or equal to) d, and the set of elements that are lower than 
(or equal to) d. These sets we call the upper cone and lower cone of d, respectively. 
See Fig. 13.4. Here is the definition. 


Definition 13.3. (Upper and Lower Cone) The upper cone of a T-degree d 
is the set ucone(d) = {x € D|d < x}, and the lower cone of d is the set 
Icone(d) = {x € D|x < d}. 


Fig. 13.4 The upper and 
lower cone of a T-degree 


d. Any T-degree that is <- ucone(d) ; 
comparable to d is in d’s D / 
upper or lower cone / 
ed 
Icone(d) / 
@0 


The next two theorems tell us what the population is of each of the two cones. 


Theorem 13.14. The upper cone ucone(d) is uncountable, for any T-degree d. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


Since we can construct with the Turing jump only Ng 7-degrees, Theorem 13.14 
tells us that the Turing jump cannot uncover all 7-degrees above a given T-degree. 
We have seen one method for constructing T-degrees not reachable by the Turing 
jump in the proof of Theorem 13.5, and another will be described in the next chapter. 


Theorem 13.15. The lower cone Icone(d) is countable, for any T-degree d. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 
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13.2.5 Minimal T-Degrees 


By Theorem 13.8, the 7-degree 0 is lower than any other T-degree d. Pick an arbi- 
trary d(~ 0). There may exist a T-degree c which is strictly between 0 and d, that 
is, 0 <c <d. (For instance, take d = 0” and c = 0’; or, take d= 0' and c =a in 
Theorem 13.5.) 

The following question immediately arises: Does there exist a T-degree d(+ 0) 
such that no T-degree is strictly between 0 and d? If such a d exists, it is called 
minimal. (See Fig. 13.5.) Here is the definition. 


Definition 13.4. (Minimal Degree) A T-degree d is minimal if d 4 0 and there is 
no T-degree c such that 0 <c <d. 


Fig. 13.5 An element d 4 0 
is minimal if no element other . ucone(d) 
than 0 is lower than d \ 


D / 


In 1956, Spector? proved that minimal degrees actually exist. He proved the ex- 
istence of a minimal degree below 0”. 


Theorem 13.16. There exists a minimal T-degree d; in addition, d < 0". 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


In 1961, Sacks* proved the existence of minimal degrees that are even below 0’. 


Remarks. Let us interpret the above results in the world of decision problems. For every degree 
of unsolvability there are No decision problems of that difficulty (Proposition 13.1). There are un- 
countably many degrees of unsolvability that can have a decision problem (Theorem 13.3). There 
exist pairs of decision problems such that neither problem is easier (or more difficult) than the 


3 Clifford Spector, 1931-1961, American mathematician. 
4 Gerald Enoch Sacks, b. 1933, American mathematician and logician. 
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other (Theorems 13.5 and 13.6). Actually, there are uncountably many pairs of such incomparable 
decision problems (Theorem 13.7). For any finite set of decision problems there exists a “‘super- 
problem” whose decidability would imply the decidability of every decision problem in the set. 
In addition, there is no easier superproblem for this set of decision problems (Theorem 13.9). 
For any decision problem, there are uncountably many more difficult decision problems (Theo- 
rem 13.14), and only countably many easier decision problems (Theorem 13.15). There are unde- 
cidable decision problems for which there exist no easier decision problems besides the decidable 
ones (Theorem 13.16). 


13.3 Chapter Summary 


Every T-degree is countably infinite. The class of all T-degrees is uncountable and 
is partially ordered by the relation <. The Turing jump operator maps a 7-degree 
into a higher 7-degree. There exist J-degrees that are incomparable with the rela- 
tion <; in fact, there are uncountably many pairs of such incomparable degrees. 
The 7-degree of decidable decision problems, 0, is the <-least T-degree in the class. 
Any two 7T-degrees have a <-least upper bound, but not necessarily a <-greatest 
lower bound. Between any 7-degree and its T-jump there are infinitely many pair- 
wise incomparable 7-degrees and infinitely many linearly ordered T-degrees. The 
upper cone of any 7T-degree contains uncountably many 7-degrees, and its lower 
cone contains only countably many 7-degrees. There are minimal degrees, some of 
which are below the 7-degree 0”, or even below 0’. 


Problems 


13.1. Prove: (D,<) is partially ordered. 


13.2. Prove: There are sets A and B such that A <7 B and A’ =, B’. 


Remark. Thus, the converse of the implication A =r B => A’ =r B’ (Theorem 13.1) does 
not hold; we may have A #7 B and still A’ =r B’. 


13.3. Prove: The Turing jump operator ’ defined on the class ‘D is not injective. 
[Hint. See Problem 13.2.] 


13.4. Complete the proof of Theorem 13.9. 
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ye 
Chapter 14 Serato 
C.E. Degrees and the Priority Method 


If something has priority over other things, it is regarded as 
being more important than them and is dealt with first. 
An injury is damage done to somebody’s body. 


Abstract Among the Turing degrees, the so-called computably enumerable (c.e.) 
degrees are all-important. This is because they stem from c.e. sets, the sets that often 
spring up in practice. In this chapter we will present the basic facts of c.e. degrees. 
We will then describe Post’s Problem, a problem about c.e. degrees that was posed 
by Emil Post in 1944. After a series of attempts by Post and others, the problem 
was finally solved in 1956 by Muchnik and Friedberg. They simultaneously and 
independently devised a method, called the Priority Method, and applied it to solve 
the problem. We will describe Post’s Problem and the Priority Method. 


14.1 C.E. Turing Degrees 


Undecidable c.e. sets are interesting for several reasons. The first reason is that most 
of the undecidable decision problems that have arisen in practice, i.e., outside pure 
Computability Theory, are represented by undecidable c.e. sets. (See Sect. 8.3 for a 
number of such problems that emerged in various areas of science.) The second rea- 
son is that, roughly speaking, any such set is in a way “close” to the decidable sets, 
in the sense that it is more amenable to recognition than any other, non-c.e. set. Put 
differently, for any such set the membership problem is semi-decidable since there 
exists an algorithm that is capable of recognizing any member of the set, though 
incapable of recognizing every non-member of the set. (See Sect. 8.2.) 


Remarks. An undecidable c.e. set is mirrored in the corresponding decision problem. There exists 
an algorithm capable of solving any positive instance of the problem, but no algorithm is capable 
of also solving any negative instance. Informally, such a decision problem is just semi-solvable. 


As a consequence, much research has been and still is devoted to c.e. sets and their 
degrees of unsolvability. We will describe some of this research in this chapter. 
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Completeness and c.e. Degrees 


In 1944, Post proved that the set K has the following interesting property: Every 
c.e. set is 1-reducible to K. (See Definition 9.2 (p. 228) and Problem 9.3 (p. 228).) 
Since 1-reducibility is also m-reducibility (Sect. 9.2.4), and hence T-reducibility, it 
followed that every c.e. set is T-reducible to K’. Post called a set such as K complete 
relative to T-reducibility. We now define the concept of completeness more gener- 
ally, i.e., relative to an arbitrary reduction, <c. (See discussion on <c¢ in Sect. 9.2.1.) 


Definition 14.1. (Complete Set) Let <¢ be an arbitrary reducibility. A c.e. set S is 
said to be C-complete if A <c S for every c.e. set A. 


So, Post was aware that K is T-complete. As this played an important role in his 
further research, we state it as a theorem. 


Theorem 14.1. The set K is T-complete. 


Just as c.e. sets are important, so are their degrees of unsolvability. These we 
introduce in the next definition. 


Definition 14.2. (c.e. Degree) A T-degree is computably enumerable (c.e.) if it 
contains a c.e. set. 


Remark. It is customary to call a T-degree that is computably enumerable just a c.e. degree. 


Since @ and K are c.e. sets (see Sect. 8.2.1), both 0 and 0’ are c.e. degrees. 


14.2 Post’s Problem 


In 1944, Post asked whether there are any c.e. degrees strictly between 0 and 0’. 
Here is the citation from his paper of 1944: 


A primary problem in the theory of recursively enumerable sets is the problem of determin- 
ing the degrees of unsolvability of the unsolvable decision problems thereof. We shall early 
see that for such problems there is certainly a highest degree of unsolvability. Our whole 
development largely centers on the single question of whether there is, among these prob- 
lems, a lower degree of unsolvability than that, or whether they are all of the same degree 
of unsolvability. 
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Remarks. “Recursively enumerable sets” is the old name for c.e. sets. The problems of interest 
to Post are decision problems associated with undecidable c.e. sets, that is, undecidable semi- 
decidable decision problems. Clearly, the highest degree of unsolvability of such problems is 0’. 
Post asks whether or not any such problem can be of lower degree of unsolvability than 0’. 


This question became known as Post’s Problem. 


Definition 14.3. (Post’s Problem) Is there ac.e. degree c such that 0 <c <0’? 


We already know that there are T-degrees strictly between 0 and 0/ (Theorem 13.5). 
However, the 7-degrees constructed in the proof of this theorem need not be c.e. This 
is why Post embarked on solving this problem. In the next section we will describe 
his approach. Although Post’s approach proved to be only partially successful, it 
nevertheless brought many new ideas and, more importantly, motivated researchers 
to search for different methods that would lead to a solution to the problem. 


14.2.1 Post’s Attempt at a Solution to Post’s Problem 


To solve this problem positively, Post wanted to prove that there is a c.e. set A 
such that 0 <r A <r K. How would he do that? He had the following idea: If he 
managed to prove that there exists an undecidable c.e. set A such that K <7 A, then 
® <7 A <r K would follow. In other words, Post wanted to find an undecidable 
c.e. set that is T-incomplete. So he set a list of goals—called Post’s Program—that 
should be attained in order to solve the problem. 


Post’s Program had three goals: 


A. define a property of a c.e. set; 
B. prove that any set with this property is undecidable and T-incomplete; 
C. prove that there exist c.e. sets with this property. 


He started modestly, with the m-reducibility <,,, which is more special than <7, 
the Turing reducibility. Then, if he succeeded in fulfilling his program with <,,, he 
would try to generalize the proof to the reducibility <;. 

Suppose that K <,, A for some c.e. set A. Then, there is a computable function 
r such that r(K) C A and r(K) C A (see Problem 9.1b on p. 228). Post’s idea 
was to achieve K £,, A by defining A so that A would be unable to contain r(K), 
for any computable function r. Thus, the complement A would be too “sparse” to 
accommodate r(K), in the sense that it would lack certain objects which, in contrast, 
abound in K. Post hoped that this property of A would prevent the m-reduction of 


Kto A. 
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How should he define this property of “having a sparse complement’? Post an- 
alyzed the properties of the set K and discovered that the complement K con- 
tains infinitely many c.e. subsets. (See the details in Box 14.1.) So, if A had no 
c.e. subsets at all, it might be unable to host r(), for any computable function r. 
This led him to the definition of simple sets. Informally, a c.e. set A is simple if A is 
“sparse,” in the sense that, although infinite, A does not contain any infinite c.e. set. 


Here is the definition. 


Definition 14.4. (Simple Set) A set A is simple if A is c.e. and A is infinite 
but A contains no infinite c.e. set. 


Next, in accordance with goal B of his program, it was necessary to prove that 
a simple set A would actually do the job, ie., guarantee that K <,, A. In other 
words, a simple set would be m-incomplete. That this is indeed so, Post proved in 
the following theorem. 


Theorem 14.2. [fa set A is simple, then A is not m-complete. 


It remained to be proved that a simple set actually exists (goal C). Using diago- 
nalization, Post proved the following theorem. 


Theorem 14.3. There exists a simple set A. 


Proof. We omit the proof. See the Bibliographic Notes to this chapter. 


In summary, Post discovered that there exist c.e. sets that are strictly between @ and 
K in the <,, ordering. So he could write the following corollary. 


Corollary 14.1. There is ac.e. set A such that 0 <m A <mK. 


Yet, this was not the final solution to his problem, because the question was 
whether the statement of Corollary 14.1 holds for the Turing reducibility <r. 

In order to generalize the above result to the ordering <7, Post defined a more 
general notion of reducibility, the bounded truth-table reducibility <p. (See the 
Bibliographic Notes to this chapter for definitions of the notions that we mention in 
this paragraph.) Post was able to prove that if A is simple, then A is btt-incomplete 
(i.e., K py A). Thus, there is a c.e. set A such that 0 <py A <py K. This success 
led him to try with the still more general notion of truth-table reducibility <, (that 
is, <j with no bounds). However, it turned out that if A is simple, the relation 
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K &# A cannot be proved. Since this indicated that the notion of the simple set 
was too weak, he defined the more powerful notion of the hyper-simple set. He was 
then able to prove that (goal B) if A is hyper-simple, then A is ft-incomplete (i.e., 
K £1 A), and (goal C) a hyper-simple set actually exists. It followed that there is a 
c.e. set A such that 0 <4 A<yK. 

The reducibilities <1,<m,<p, and <j, are called strong. This is because they 
are obtained by imposing additional conditions on the Turing reducibility <r. We 
now see that Post aimed to gradually relax the strength of the additional conditions 
and eventually obtain the Turing reducibility; during this he would define, for each 
new weaker condition C, a new kind of c.e. sets (goal A), prove that such sets meet 
the condition 0 <¢ A <c K (goal B), and prove the existence of such sets (goal C). 

Unfortunately, his progress stalled after success with the truth-table reducibility 
(C = tt), and this was still far from the Turing reducibility. To continue, he defined 
hyperhyper-simple sets, but he couldn’t prove their existence. 


Box 14.1 (Creative Sets). 


When Post analyzed the set KC and its complement K, he discovered that KC possesses an interesting 
property, which he then called creativity. Here is the definition of a creative set. 


Definition 14.5. (Creative Set) A set C is creative if C is c.e. and there is a p.c. function @ such 
that (Vx)[W, CC => g(x) ) A(x) EC -—W,]. 


What does that mean? First, observe that W, denotes the xth c.e. set. (To see that, combine Defi- 
nition 6.4 (p. 135) and Problem 6.7 (p. 151).) The function @ produces, for any W,, an element 
(x) witnessing that CA VW, (as p(x) € C—W,). We call @ the production function of the creative 
set C. Since we can effectively produce, for any WV,, a counterexample for the assertion C=W,, 
we say that C is effectively non-c.e. Hence, C is an effectively undecidable set. 

The set C contains an infinite c.e. set; moreover, it contains infinitely many infinite c.e. sets (see 
Problem 14.2). 

Since the set K is creative (see Problem 14.3), K is effectively non-c.e. and contains infinitely 
many infinite c.e. sets. Now Post’s next step was obvious: For the searched-for set A, take a non- 
creative set. Simple sets are suitable non-creative sets (see Definition 14.4). 


In summary, the ultimate goal of Post’s Program for solving Post’s Problem was 
to define an appropriate structural property of c.e. sets that would guarantee the ex- 
istence, undecidability, and incompleteness of c.e. sets having this property. As we 
have seen, the structural property that attracted Post was “sparseness of the com- 
plement.” Could Post have succeeded in attaining his program had he continued 
his investigation in this direction? The answer is no. In 1965, Yates! proved that 
the structural property of having a “sparse complement’, and the new hyperhyper- 
simple sets could not lead to a positive solution to Post’s Problem. 


'C.E.M. Yates, British mathematician, logician, and computer scientist. 
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14.3 The Priority Method and Priority Arguments 


Nevertheless, today we know that the answer to Post’s Problem is positive: There is 
ac.e. set A such that 0 <7 A <7 K, and there is ac.e. degree a such that 0 <a <0’. 
How was the problem resolved? 

Post posed his problem and described his attempt at a solution in 1944. Ten 
years later, in 1954, when Kleene and Post published their seminal paper containing 
many other discoveries and ideas (see Chap. 13), the problem was still open. In 
the paper, Kleene and Post also introduced the method of finite extensions, which 
they used to prove the existence of <-incomparable T-degrees (see Theorem 13.5). 
This attracted the attention of several researchers. In 1956, two of them, Friedberg” 
and Muchnik?, simultaneously and independently upgraded the finite extensions 
method into a subtler one, the Finite-Injury Priority Method, as it is called today. By 
applying it, Friedberg and Muchnik obtained a positive answer to Post’s Problem. 


14.3.1 The Priority Method in General 


We will now describe the Priority Method in general. Let P be a property sensible 
for sets. Is there a c.e. set with the property P? To answer the question affirmatively, 
we can embark on the construction of such a set. We can try to construct a c.e. set S 
with the property P by adhering to the guidelines described in the following. 


1. The set S will be constructed step by step, in an infinite sequence of stages. 
Each stage, say i, will construct a finite set S;, which will be an approximation of 
the set S. Informally, S; will be obtained by adding new elements into S;_; and/or 
banning certain elements from entering S;. Intuitively, we want the sets S; to mono- 
tonically grow as i increases, and eventually (in the limit) develop into the set S. 
The plan is to define the stages in such a way that two objectives will be achieved: 


Objective 1: Sj; C §;, for every i > 1; 
Objective2: US;=S. 
i=l 


That is, each S; should be a better approximation of S than S;_1, so that lim S; = S. 
I> 


But, isn’t our plan unrealistic? How can we approximate an unknown object such 
as S? If S exists, it will become fully known only after infinitely many stages of the 
construction; and yet, we intend to obtain, at each stage i, a better approximation S; 
of S. How will we evaluate the quality of S;? Will S; correctly extend S;_,? This 
information will be needed to avoid missing the objectives. 


? Richard Michael Friedberg, b. 1935, American theoretical physicist. 
3 Albert Abramovié Muénik, 1934-2019, Russian mathematician. 
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2. The above situation is resolved by making a radical change in our view of what 
is approximated during the construction. Instead of approximating the unknown set 
S, we approximate the known property P. The idea is that the properties of each 
finite set S; should approximate the property P; that is, they should satisfy, at least 
partially, the property P. To implement this idea, we must somehow atomize P, 
i.e., break it into a set of primitive properties. Only then will a set S; be capable of 
fulfilling a part of P. The primitive properties are called the requirements. Clearly, it 
is sensible to atomize P in such a way that any set fulfilling a// the requirements will 
fulfill the whole of P. (Such would be the set S.) To ensure this, we must break P 
into a conjunction of requirements. So we have developed the first two guidelines: 


G1: Write P as a conjunction Ro \ R, \ R2... of countably many requirements Rj. 
G2: Do G1 in such a way that a set has the property P <=> the set fulfills every R;. 


3. What is a requirement and how is it fulfilled? In order to keep it as simple 
as possible, a requirement is only allowed to specify that certain numbers must be 
added (i.e., enumerated) into the set S; and/or that certain other numbers must be 
kept out of this set (for some 7). So, a requirement will be fulfilled by carrying out 
finitely many instructions of the form 


x€E!S; (add x into S;) 
or 
x€¢!S; (ban x from S;). 


Note that x ¢! S; does not mean that x is deleted from S;; it means that x is kept out 
of S;. A requirement R is initiated (tried to be fulfilled) at the stage when it receives 
attention. At that stage, say i, R can be fulfilled simply by ensuring the presence or 
absence of a particular number in the set Sj, i.e., by imposing either x €! S; or x ¢! S;, 
where x is a candidate—a number whose status (potential membership in S;) has 
not been considered, for any j < i. In general, there are infinitely many candidates 
and we must choose one of them. This is summarized in the following guideline: 


G3: At stage i, a requirement R is fulfilled by deciding, for a candidate x, whether 
x E!S; or x Z!S;. 


Remark. Fulfilling R uncovers just a small part of the information about S, namely, the status of 
the candidate number. The current information about the contents of S, which has been gathered 
by fulfilling requirements up to the end of stage i, is represented by S;. Since S; fulfills more re- 
quirements than S;_1, it meets P better than Sj_;, and hence better approximates S. 


4. It is now obvious that we must arrange the construction in such way that each 
and every requirement will eventually be considered. If we succeed in finding such 
an arrangement, then each and every requirement will get a chance to be fulfilled, so 
that all the requirements may eventually become fulfilled. In that case, they will be 
fulfilled by the limit set limj_,@5S;. According to G2, this set will have the property 
P, so it will be the searched-for set S. 
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An intuitively appealing guideline for considering the requirements would be: 
Consider Ro, then consider R;, then consider R2, and so on. Actually, this arrange- 
ment worked in the proof of Theorem 13.5 (p. 277), and the reason for this is easy to 
find: A requirement, once fulfilled, always remained fulfilled. The proving method 
where this holds is called the Finite Extension Method, for short the FEM. But the 
FEM is not considered to be a true Priority Method, because it cannot deal with the 
more general situation, which we describe in the following. 


5. Unfortunately, considering the requirements in a simple succession may not 
suffice. Why? By any stage only finitely many requirements can be fulfilled, which 
leaves infinitely many Rs to be considered and fulfilled in the rest of the construc- 
tion. But, the future requirements may not be independent of those already fulfilled; 
they may interact with each other. So, we are faced with a permanent lack of infor- 
mation about future Rs—and this may have significant consequences. 

In particular, it may turn out, at any stage /, that it is impossible to fulfill the 
current requirement R. How can that happen? There can be two reasons. First, at 
some previous stage j < i, a requirement R’ was fulfilled in a wrong way; that is, 
R’ was fulfilled by banning a candidate y from Sj, i.e., setting y ¢!S;, and this 
decision now, at stage i, prevents us from fulfilling R. In short, we banned from S; a 
candidate that we shouldn’t have—and we could not anticipate that at stage j. The 
second reason can be that R’ and R are contradictory requirements and cannot both 
be fulfilled. (Isn’t there also a third possibility? Cannot y €! S; be a wrong decision? 
Since the set S is to be c.e., we assume that the decision y €! S; C S is well grounded 
and can be made effectively, so there will be no need to revoke it.) 

We see that there should be a possibility of returning to a previously fulfilled re- 
quirement and fulfilling it in some other way that would allow us to proceed with the 
construction. But, things are more complicated than that: Changing a decision about 
Sj (e.g., from y ¢!S; to y €! S;) may affect the sets Sj+1,...,S;—1 (which were con- 
structed from Sj), so also the requirements fulfilled at stages j+1,...,i—1 may 
have to be reconsidered and fulfilled in some other way. 


6. How can the bewildering situation described in 5 be controlled so that all the 
changes will be systematically enforced? Here, Friedberg and Muchnik introduced 
a new ingredient in the method: They assumed that different requirements have 
different priorities. Informally, R’s priority is used to represent the importance of R. 
Here is the new guideline. 


G4: With different requirements associate different priorities. 


It proves to be useful to index the Rs in such a way that their priorities decrease as 
the index increases; that is, R; is of higher priority than R;z+1, for any k > 0. So, a 
requirement with a lower index has higher priority and is therefore more important. 
From now on we will assume that such an indexing has been done. 
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7. The construction is arranged so that, at each stage, the next initiated require- 
ment is the one that has the highest priority among the requirements that require 
attention at that stage. We say that such a requirement receives attention. 


G5: At any stage initialize the highest-priority R that requires attention at that stage. 


8. How did Friedberg and Muchnik apply the concept of priorities to resolve 
the situation described in paragraph 5? Recall the situation: To fulfill the current R, 
a previously fulfilled R’ should be reconsidered and fulfilled in some other way. 
Whether or not this is allowed will depend on the priorities of R and R’ in the 
following way: 


e R has higher priority than R'. If R’ was fulfilled by wrongly choosing y ¢! Sj, 
we are allowed to return to R’, revoke that choice, and change it to y €!S ;- This 
will enable the fulfillment of R. However, it will also turn R’ unfulfilled again. 
We say that R’ has been injured by R. (We will have to cure R’, that is, fulfill R’ 
in some other way that does not affect R. We will do that by considering some 
other candidate number y’ and choosing either y’ €!S; or y! ¢! S}.) 

e Rhas a lower priority than R’. In this case R is not allowed to injure R’. 


This is summarized in the next guideline. 
G6: A requirement can be injured only by a higher-priority requirement. 


9. Friedberg and Muchnik’s next key assumption was that an injured requirement 
R will be reconsidered after all the injured requirements having higher priority than 
R are cured. Here is the guideline. 


G7: An injured R will be initiated after all higher-priority injured Rs are cured. 


If Rx is injured, then there can be at most k more important injured requirements: 
Ro,..-,Rx_1. It will take finitely many steps to cure all of them, so an attempt at R;’s 
recovery is guaranteed to start after a finite number of stages. 


10. Although only finitely many requirements can injure an R, there might still 
exist a requirement that would injure R infinitely many times. In such a case, R 
would never stop being injured and could not recover once and for all. Could the 
entire conjunction Ro A R; \ R2... then be fulfilled? To prevent such a situation, 
Friedberg and Muchnik assumed that each requirement can be injured only finitely 
many times. So, they introduced the following guideline. 


G8: A requirement can be injured only finitely many times. 


This concludes the general description of the Priority Method for the construction 
of a set S with property P. However, there are several variations of the method. For 
example, when the method adheres to all of the guidelines G/, ..., G&, it is called 
the Finite-Injury Priority Method (FIPM). When G& cannot be assumed, we obtain 
the Infinite-Injury Priority Method (IIPM). 

A proof using any kind of Priority Method is called a priority argument. Priority 
arguments have been classified into a hierarchy based on their complexity. 
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14.3.2 The Friedberg-Muchnik Solution to Post’s Problem 


Using their Priority Method, Friedberg and Muchnik proved that there exist c.e. sets 
A and B such that A <7 BA B &7 A. (As A, Bare c.e., also A <r @ and B <7 0.) 
It immediately followed that deg(A) and deg(B) are <-incomparable c.e.-degrees— 
and Post’s Problem was solved positively. Here is the Friedberg-Muchnik theorem. 


Theorem 14.4. There exist incomparable c.e.-degrees. 


Proof. We describe the idea of the proof; for the details see the Bibliographic Notes to this chapter. 
Friedberg-Muchnik’s general goal was to extend the proof of Theorem 13.5 (p. 277) to the case of 
c.e. sets. So the requirements stated in that proof remain the same (now they are all denoted by Rs): 


Rr [XA # a 
Root XB # OF 


Fulfilling a Single Requirement. To fulfill Ro. we first associate with it a candidate number, an x 
whose status (membership in .A) is still open. Then we look for a stage s+ I such that &Ps (x) | = 0. 
If there is no such stage, then this means that x ¢ A and 67 (x)t v 68 (x) | ¥ 0, implying that x 
fulfills Roe. If, however, such an s+ | exists, then R2¢ will require attention at that stage. 

When, at stage s+ 1, Roe actually receives attention, we do the following: 


e add x into A,;,; (and, hence, into A); 

e protect the construction. We do this by trying to restrain too-small numbers y from later en- 
tering 6 by any requirement of lower priority than R2-. More specifically, we choose new 
candidates for all (lower-priority) requirements Ry,k > 2e, and initialize them. This ensures 
that only (higher-priority) requirements Ry, k < 2e, can later injure Ro by adding some small y 
into B. 


This fulfills the requirement R»,. If later a requirement of higher priority than Ry, is enumerated 
into B and injures Roe, then R2- is initialized and must be fulfilled again by another candidate. 
(To fulfill a requirement R2,;; we use the same strategy as for Ro, but with the roles of A and B 
reversed.) 


Fulfilling All Requirements. We saw that, occasionally, we must choose new candidate numbers 
for some requirements. There are two restrictions on this: First, for any Rz, we can choose another 
candidate for R; only finitely often; and second, candidates for different requirements must be 
distinct. The latter restriction is met by choosing all candidates for a requirement R,; from the set 


NM“ S {(n,k) |n € N}. 


Construction of A and B. Initially, we have Ap = Bo = 0. At stage s > 0 we do the following. 
Let R, be the highest-priority unfulfilled requirement and let r be the stage at which Ry was most 
recently initialized; of course, r < s. Then we determine x in the following way: 


e ifk=2e  : Letxbe the least x € Ni —.A,_, such thatx > rand 6 (x) =0; 
e ifk=2e+1: Let x be the least x € NI‘ — B,_, such that x > r and ef! (x) =0. 


In the first case we add x into A, and in the second case we add x into B. (Observe that x < s.) We 
then declare R, fulfilled and initialize all requirements of lower priority (as described above). 


It can be proved that, for every k, requirement R; receives attention at most finitely often, is injured 
at most 2* — | times, and is eventually fulfilled forever. 
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14.3.3 Priority Arguments 


We have seen that Friedberg and Muchnik’s solution to Post’s Problem is somewhat 
difficult to follow. This is not a coincidence. Today, when the Priority Method is 
the main technique for establishing results about c.e. sets, it is known that priority 
arguments are usually very complex and sophisticated. First, the requirements and 
the strategy by which they are fulfilled must be carefully constructed to produce 
the required result—and this must be done for each problem separately. In addition, 
requirements can be first fulfilled and later injured, so the membership of a number 
in the constructed set can be first determined and later undone. 

However, as the results obtained by priority arguments have been multiplying, at- 
tempts to make priority arguments more systematic and intelligible have also started 
appearing. The aim of these attempts is to isolate the general principles that are com- 
mon to existing priority arguments (and, hopefully, to those yet to be constructed). 
Ultimately, the attempts should lead to a framework offering a uniform approach to 
the construction of priority arguments. 

While waiting for such a systematic simplification of priority arguments, it has 
become desirable to prove results without priority arguments, or to see whether 
results proved with priority arguments can also be proved without them. For exam- 
ple, in 1986, Kuéera* devised a proof of Post’s Problem without using the priority 
method. Kuéera’s proof is involved too, but the resulting set is less artificial. 


14.4 Some Properties of C.E. Degrees 


When it became known that there are more than just two c.e. degrees, research into 
c.e. degrees started to flourish. Many of the results were (and are being) obtained by 
priority arguments. We now briefly list some of the first results. For further results 
see the Bibliographic Notes to this chapter. 


. Every c.e. degree is <0’. 

. Not every T-degree which is < 0' is ac.e. degree. 

. Density Theorem: Between any two c.e. degrees there is a third c.e. degree. 

. There are two c.e. degrees with no glb in the c.e. degrees. 

. There is a pair of nonzero c.e. degrees whose glb is 0. 

. Nondiamond Theorem: There is no pair of c.e. degrees whose glb is 0 and lub is 0’. 


NANNKWN KE 


The above theorems were proved in the mid-1960s. In particular, 3 was proved in 
1964 by Sacks; 4 and 5 were proved in 1966 by Lachlan® and Yates; and 6 was 
proved in 1966 by Lachlan. See Bibliographic Notes to this chapter for the details. 


4 Antonin Kuéera, Czech mathematician, logician, and computer scientist. 
5 Alistair H. Lachlan, Canadian mathematician, logician, and computer scientist. 
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14.5 Chapter Summary 


A c.e. set is said to be Turing complete (7-complete) if every c.e. set is T-reducible 
to it. The set K is T-complete. A T-degree is said to be c.e. if it contains a c.e. set. 
Both 0 and 0’ are c.e. degrees. Post’s Problem asks whether or not there exists a 
c.e. degree c that is strictly between 0 and 0’, that is, O<c < . 

Post attempted a solution to his problem by devising a program which is now 
called Post’s Program. His aim was to define a structural property of c.e. sets that 
would guarantee the existence, undecidability, and Turing incompleteness of c.e. sets 
having this property. In trying to achieve that, Post defined various reducibilities and 
special kinds of c.e. sets, but did not succeed in attaining his program. 

A positive solution to Post’s Problem was obtained in 1965 by Friedberg and 
Muchnik, who devised and used a new method called the Priority Method. In this 
method, a conjunction of simple requirements must be fulfilled. As more and more 
requirements are fulfilled, a larger and larger part of the set to be constructed is 
uncovered. However, the requirements are interrelated, so the requirements that have 
already been fulfilled may become unfulfilled again, thus temporarily concealing a 
part of the uncovered set. Such injured requirements must be fulfilled again in some 
other way, and the process repeats. Although involved, the Priority Method is today 
the main technique for establishing results about c.e. sets. 

Proofs that use this method are called priority arguments. There are attempts to 
isolate the general principles common to priority arguments and to integrate them 
into a framework that would offer a uniform approach to the construction of priority 
arguments. 

At the same time, priority-free proofs are searched for because the sets con- 
structed by them are less artificial. One such is Kuéera’s priority-free proof of Post’s 
Problem. 


Problems 


14.1. Prove: If C is a creative set, then C contains an infinite c.e. subset. 


[Hint. Let ~ be the production function of C. Let n be an index of the empty set, i.e., 0 = Wy. 
Consider the set W = {x,,x2,...}, where x; is defined inductively as follows: W,, = {@(n)} 
and W,,,, = Wy, U{@(xi)}. So, the construction of the set W starts with VW, = @ as the first 
current set and then, in each step, adds to the current set its witness to obtain the next current 
set.] 


14.2. Prove: If C is a creative set, then C contains infinitely many infinite c.e. subsets. 
[Hint. See Problem 14.1 and use any n € ind(@).] 


14,3. Prove: K is a creative set. 
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Chapter 15 | Gpdates” 
The Arithmetical Hierarchy 


For every sensible question there is an answer; for every answer 
there is a sensible question. 


Abstract In this chapter we will introduce a different view of sets of natural num- 
bers. Sometimes such a set can be defined by a property of its members, where the 
property is expressed by a formula of Formal Arithmetic. Sets defined by formulas 
of the same complexity constitute an arithmetical class. Different complexities of 
formulas give rise to different arithmetical classes. There is also an ordering between 
these classes, so they form the so-called Arithmetical Hierarchy. We will show that 
the Arithmetical Hierarchy is closely connected with the Jump Hierarchy. 


15.1 Decidability of Relations 


Before we delve deeper into the main subject of this chapter, we must prepare the 
ground by defining a few new notions and proving some basic facts about them. 

A k-ary relation on a set S is a subset R of S*. If R is a k-ary relation on S, it 
is customary to write R(a1,...,a,) to indicate that (a1,...,a,) € R. When k = 1 we 
say that R is a property defined on S; when k = 2 we call R a binary relation on S. 
Just like any other set, the set R can also be decidable, semi-decidable, undecidable, 
or can have any other property sensible for sets. In this case we say that R has (or 
does not have) such a property. Here are the definitions that we will need. 


Definition 15.1. (Decidable Relation) A k-ary relation R on a set S is decidable 
(or semi-decidable, or undecidable) if the corresponding set R C S* is 
decidable (or semi-decidable, or undecidable). 


Remarks. 1) Instead of decidable (semi-decidable, undecidable) relation, we may say computable 
(c.e., incomputable) relation. 2) If R is decidable, we can decide, for any (a1,...,a,) ES k whether 
or not R(a1,...,a,). If R is undecidable but still semi-decidable, there is an algorithm guaranteed to 
return an answer (a YES) only for k-tuples that are in R, and which returns NO or fails to terminate 
for k-tuples in 7. 
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We will now fix the set S to S = N. Relations on N are said to be arithmetical. 
Example 15.1. (Relation Ry) Let us define an arithmetical relation Ryair(e,x, 5) as follows: 
Ruan (e,x,8) = “Turing machine T, halts on input x in at most s steps and returns a result.” 


This relation is decidable: Given any (e,x,s) € N3, construct T, (see Sect. 7.2), start J, on x, and 
let it run until the first s steps have been completed or a result has been output. 


Let us now take an arbitrary decidable relation R(x,y) on N and define the set A 
to be A = {x € N|SyR(x,y)}. So A consists of those numbers that are R-related to 
some number. What can be said about the decidability of A? Here is the answer. 


Theorem 15.1. A set AC Nis ce. iff A= {x € N| dyR(x,y)} for some decidable 
relation R onN. 


Proof. (<=) Let R be an arbitrary decidable relation on N. Let Dr be a decider of the corre- 
sponding set R, and Gyp a pair generator (see Sect. 6.3.4 and Box 6.3). Then we can construct 
a generator G_4 of the set A as follows. G4 repeats the following sequence of steps: (1) it calls 
Gy to generate the next (x,y); (2) it calls Dz to see whether (x,y) is in R; (3) if (x,y) ER then it 
generates (outputs) x. Since A can be generated by a TM, it is ac.e. set. (=) If A is c.e., then it 
is the domain of a p.c. function, @, (Problem 6.7). So A={xEN|AsRyar(e,x,5)}, with Raa from 
Example 15.1. 


15.2 The Arithmetical Hierarchy 


In the 1940s, Kleene and Mostowski! were exploring the sets of natural numbers 
that are defined as {x € N|F(x)}, where F(x) is a formula of Formal Arithmetic, 
and x is a free individual variable in F (see Sect. 3.2). 

Kleene was investigating how the syntactical complexity of F(x) affects the de- 
cidability of the set {x € N|F(x)}. The obvious question was how to measure the 
syntactical complexity of F (x). Here, Kleene leaned on the well-known fact, which 
was published already in 1885 by Peirce, that every formula can be transformed 
into prenex normal form (pnf), i.e., a logically equivalent formula consisting of a 
string of quantifiers followed by a quantifier-free formula. Kleene could therefore 
assume that F(x) is of the form Q1y1... Ogy,R(x,y1,---, x), Where yj Ay; for i F j, 
Q; is V or J, and R is a decidable arithmetical relation (predicate). In addition, adja- 
cent quantifiers of the same kind (i.e., those having the same quantification symbol) 
can be contracted and replaced by a single quantifier of that kind (see Box 15.1). 
After all possible contractions have been performed, the resulting prenex normal 
form has a sequence of alternating quantification symbols, i.e., the formula F (x) 
has been transformed either into the form 


: Andrzej Mostowski, 1913-1975, Polish mathematician. 
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dyiVy2ay3 os  OynR(x,Y1, 92. exe Yn), 


where Q is J if n is odd, and V if n is even; or into the form 


Vy dy2Vy3 oe OynR(x,Y1,Y2- ane Yn); 


where Q is V if n is odd, and J if n is even. 


Box 15.1 (Contraction of Quantifiers in Prenex Normal Forms). 


We describe the contraction of quantifiers on an example. Let Sy; dy2Vy3Vy4VysP(91,92,93,4,)s) 
be a formula. We introduce two individual variables u = (y1,2) and v = (y3,4,ys). Recall that the 
projection function ak returns the ith component of a k-tuple (Box 5.1, p. 82). Using u,v,2? and 
7 we transform the formula into an equivalent formula SuVvP(2; (u), 73 (u), 1} (v), 75 (v), 73 (v)). 
This is the contracted pnf. Generally, given a formula Q1y; ... Ocy,P(y1,---,¥x), We first partition 
Qiy1...Qkyk Into maximal subsequences of adjacent quantifiers of the same kind; then we intro- 
duce, for each subsequence, a new individual variable (tuple); next, we replace each subsequence 
by the corresponding quantifier of that kind; and use projection functions in the predicate P. 


So, Kleene could assume that F(x) is already in the contracted prenex normal form. 
Any set {x € N|F(x)} defined by such a distinguished F (x) he called arithmetical. 


Definition 15.2.(Arithmetical Set) A set A is an arithmetical set if A= {xe N|F(x)}, 
such that, for some n> 0, the predicate F (x) =4y,Vy2Iy3...Oy,R(x,¥1,¥2,---,Yn) OF 
F (x) =VyisyoVy3...OynR(x,y1,¥2,---,)n), and R is a decidable arithmetical relation. 
In particular, when n = 0, we define F (x) = R(x). 


Then he defined the syntactical complexity of F(x) as the number n of quantification 
symbols in F(x). Depending on n and the first quantification symbol of F(x), he 
classified the arithmetical sets {x €¢ N|F(x)} into various classes. He called these 
classes arithmetical and denoted them by %,,, IT,,, and A,,. Here is the definition. 


Definition 15.3. (Arithmetical Classes) The arithmetical classes 2,,, IT,, and Ap, 
are defined as follows: 


Xn = Class of all sets {x EN | F(x)}, where F(x) =JyiVy25y3...QynR(x,y1,---,Yn) 
for some decidable arithmetical relation R; 


TI, = class of all sets {x EN | F(x)}, where F(x) =Vyidy2Vy3...OxnR(x,y1,---,Yn) 
for some decidable arithmetical relation R; 


An = class of all sets {x € N| F(x) } that are in 2, Mh. 
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Example 15.2. (X is in £;) The set K is in 2, because K 2 {x| Ox (x) {} = {x| AsRaan(x,x,5)}, 
where Ryair 1s the (decidable) relation from Example 15.1. 


Example 15.3. (Ko is in £;) This is because Ko “ {(e,x) | Qe(x) L} = {(e, x) |AsRuan(e,x,5)} = 
{(e,x) | dsRxan((e,x)1, (€,x)2,5)}, where (e,x)1 =e, (e,x)2 =x, and Rya is from Example 15.1. 


We now justify the title of this chapter, the Arithmetical Hierarchy. So, is there a 
hierarchy of arithmetical classes? Yes; in 1943, Kleene proved the following 


Theorem 15.2. For any n > 0, the following hold: 


a) Xn CXn+1 b) Wh, C Un+1 
©) ZnS Mh d) In C Ln 41 
@) Al, C2, 

f) A, C TI, 


Proof idea. In the first part we prove, for each of the above relations, that the left- 
hand side is related to the right-hand side with the relation C. To do this, we intro- 
duce a dummy variable y,+. In the second part we prove, for each ¥ C ¥ of the 
relations Xn Cc Xn + 1> TI, Cc Th 1> Xn c TT, t1> TT, Cc Xn t1> An Cc Ln An c Th, that 
X #Y. We use diagonalization to prove that there is a set that is in VY but not in 1. 
The diagonal argument is a generalization of the argument that we used in proving 
the existence of undecidable c.e. sets. (See Box 15.2 for further details.) 


Box 15.2 (Proof of Theorem 15.2). 


a) Let S be an arbitrary element of the arithmetical class L,. Then S = {x € N|F(x)}, 
where F(x) = dy Vy2dy3...QynR(x,y1,---,Yn) for some decidable arithmetical relation R. Now 


define a new relation R’(x,y1,---,Yn,Ynt1) = R(x,y1 yeeYn) A Yn41 = Yn+1) and a new formula 
F'(x) Sy Vy2Ay3. » OVnO'Yn41R'(X,Y1,+++;YnsYn+1), where Q’ denotes the alternate of Q. Fi- 
nally, observe that S = {x € N| F’(x)} € Dag. 

b) The proof is similar to the proof of a), except that F(x) = Vy dy2Vy3...OyaR(x,y1,---,Yn) 
and F’ (x) = Vy, SyoVys... QynQ!Yng1R!(x,915 <0 sYnsInt1)- 

c) Let S € XY, and S = {x € N|F(x)}, where F(x) = Ay Vy2dy3... OynR(x,y1,---,¥n) for 
some decidable arithmetical relation R. Observe that F(x) = dy Vy2dy3...OynR(x,y1,---;Yn) = 
Vyne1 dy Vy2dy3...Qvn[R(x,y1,-+-;¥n) A Wnt = Yn+1) |, Where we have introduced a new variable 
ynt1. Now define anew relation R’(x,y1,---,Yn,Yn+1) = R(x, ye Yn) A Wnt = Yn41) and anew 
formula F’ (x) = Vyne ay Vy2ay3...OvnR’ (x,Y1,-+-;)n,Yn41)- Then, after appropriate renaming of 
the variables y;, we see that S = {x E N| F’(x)} € Thai. 

d) The proof is similar to the proof of c), except that F(x) = Vy dy2Vy3... OynR(x,y1,---5Yn) 
and F"(x) = Syn yi Vy ayaVys ... Oya’ (x,Y1,.-:YnsInth)- 

e,f) A, C 2, follows directly from the definition of A,. The same holds for A, C Ih. 
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We demonstrate the second part of the proof in cases e) and f). We will construct a set P such 
that P € , — I, and P € I, — Ey. Since Ay = Ey, Mh, parts e) and f) of the theorem will follow. 

Each element of 2 is ac.e. set and hence the domain of a p.c. function (see Problem 6.7). But 
p.c. functions can be effectively enumerated (see Proposition 6.1 and Definition 6.4). It follows 
that we can effectively enumerate the elements of Y;. We can also enumerate the elements of IT,. 
This is because if B € I, then B = A, for some A, € 5), and we can rename B as B,. From the 
two enumerations we can construct enumerations of elements of 2 and Th and then of 2, and IT, 
for higher n. Thus we can speak, for any n > 1, of the eth set of the class X,, or the class II). 

Now define a set S as follows: S = {(e,x) € N|the eth set in ¥,, contains x}. The set S is in Ey, 
because we can write S = {(e,x) € N| dsRyan(e,x,s)}, where Ryan is the relation from Example 
15.1. Consequently, S is in Y,, for any n > 1. (Notice that S is related to the universal set (language) 
Ko; see Definition 8.5 on p. 181.) 

Based on S we finally define the set P as follows: P = {x € N| (x,x) € S}. The set P is in 
21, because P = {x € N| AsRaan(x,x,5)}. (P is related to the diagonal set KC; see Definition 8.6.) 
Hence, P € X,. However, P ¢ I,. (Suppose the contrary: P € T,. Then it would follow that 
P €S,, 80 P would be the e’th set in ¥, for some e’. Then, by definition of P, we would have that 
e EP & (e',e') ES. But, by definition of S, we would also have e’ €E PS e' €P & (e',e') ZS, 
which would be a contradiction.) So, P € 2, — Hy. Similarly we find that PeM,— Xp. 


Informally, Theorem 15.2 e,f tell us that not every arithmetical set can be defined in 
both ways, that is, both as a member of 2, and as a member of IJ,,. Next, for any 
n > 0, there are arithmetical sets {x € N|F(x)} that cannot be defined by properties 
F (x) having just n alternating quantifiers (Theorem 15.2 ,b,c,d). 


The inclusions between the arithmetical classes are depicted in Fig. 15.1. 


a) b) 


Fig. 15.1 a) The inclusions between the arithmetical classes at two successive levels; b) Hasse 
diagram representing the classes of two successive levels ordered by inclusion 


To get some feeling for the arithmetical classes, let us see what kinds of sets are 
gathered in the classes at the lowest levels of the hierarchy. For n = 0, any formula 
F (x) is just a decidable unary relation R(x), so the corresponding set {x € N| F(x)} 
is decidable. Obviously, this is also true of the members of Ip and Ao. Thus, 


Xo = Ip = Ap = class of all decidable sets. 
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Now take n=1. The class ¥; contains all the sets defined by {x € N| Ay; R(x,y1)}. 
By Theorem 15.1 any such set is c.e., so 


2 = class of all c.e. sets. 


What about the class IT,? By definition it is the class of all the sets that are defined 
by {x € N| Vy, R(x, y1) }, or, equivalently, by {x € N| =Jy;7R(x,y))}. But any such 
set is the complement of some c.e. set—namely, the set {x € N| Sy; 7R(x,y,)}—and 
is therefore called co-c.e. Consequently, 


IT, = class of all co-c.e. sets. 


The class A, is, by definition, equal to 2; Tj, so it contains all the sets that are 
both c.e. and co-c.e. But, by Theorem 7.3 (p. 156), such sets are decidable. It follows 
that 

A, = class of all decidable sets. 


15.3 The Link with the Jump Hierarchy 


What about the arithmetical classes 2), TI), An, n = 2,3,..., which reside on higher 
levels of the arithmetical hierarchy? What kinds of sets are gathered in these classes? 
Do all the sets in a given arithmetical class share the same degree of unsolvability, 
as was the case with n = 0 and partly with n = 1? If so, then is there any relationship 
between the degrees of unsolvability represented by the arithmetical classes and the 
Turing degrees? In other words, is there any connection between the arithmetical 
hierarchy and the jump hierarchy? These questions were raised by Post in the mid- 
1940s. 

To answer the questions we must somehow involve the concepts of reducibility 
and the Turing jump, which we established in previous chapters, in the arithmetical 
classes and their hierarchy. 

To achieve this, we will start by defining a new concept. Since the members of 
arithmetical classes are sets, it is perfectly sensible to consider their reducibility. 
For example, for any two given members of an arithmetical class we can investigate 
whether one is m-reducible to the other. Similarly, we can distinguish a member of 
a class from other members of the class, and call it complete in that class if every 
member of the class is m-reducible to it. Here is the definition. 


Definition 15.4. (X,,-Complete Set) A set A is £,-complete if AG, and ¥ <,,A 
for every V € 2). II,-complete and A,,-complete sets are defined similarly. 
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The class 2; will give us some clues about what might be the relationship 
between the jump and arithmetical hierarchies: 


a) We know that K € X; (see Example 15.2). We also know that K is m-complete 
(see Problem 9.2, p. 228). So, by Definition 15.4, the set K is 2;-complete. Now 
recall that K = @ (see Definition 12.1, p. 265). It follows that 0’ is £,-complete. 
Observing this we notice a possible pattern which we express in the speculation: 


Is it perhaps true that for every n > 0 the set Q"+) is Xn+1-complete? 


b) Let A € 4). Then JA is c.e. (see previous page) and there is an ordinary TM rec- 
ognizing A. The recognizer is equivalent to an o-TM with the oracle set 0, say T°. 
So, A is @-c.e. Observing this we may be led to the next bold speculation: 


Is it perhaps true that for every n > 0 the class X41 consists of 0 ")-c.e. sets? 


Indeed, in 1948 Post announced that the answer to both questions is yes. This 
is stated in the following Post’s Theorem, which relates the jump hierarchy to the 
arithmetical hierarchy. 


Theorem 15.3. Let AC N andn => 0. Then: 
a) O"+)) is ¥,41-complete; 
b) AE Zn > Ais O-ce. 
c) AE Any SS AK<r Oo”. 


Proof idea. First, part b is proven by induction on n. This is then used to prove parts 
a and c. For details see the Bibliographic Notes to this chapter. 


Since the theorem is important, we invest some time in commenting on it: 


e Part a reveals the link between the concepts of the Turing jump and the class 2, 
in terms of m-reducibility. For n = 0 it tells us that the set 0’ (= K® = K) is in Ey 
and that any set V € X; is m-reducible to K. Well, we already knew that. But, for 
n= 1, it tells us that the second jump 0” (= (0')/ = K’ = K*) is in Ez and every 
set ¥ € E> is m-reducible to K™. And for n = 2 it says that 0” (= KK) E Ls 
and any set in £3 is m-reducible to @’”; that is, 0” is X3-complete. 

e Part b reveals the connection between any two consecutive classes X,, and X41 in 
terms of their relative computability. Specifically, Y,,; contains exactly the sets 
that would become c.e. if there existed an oracle for the set 0). For example, 
the sets in 2; are Q-c.e. (that is, c.e.), which is nothing new. But the sets in X2 are 
0'-c.e. (that is, K-c.e.), so they can be recognized by 0-TMs with the oracle set 
K. Similarly, any set in £3 is @’-c.e., so it can be recognized with an o-TM with 
the oracle set K*. 

e Part c reveals the link between the concepts of the arithmetical class A,+) and 
the Y,,-complete set in terms of Turing reducibility. Indeed, it tells us that any set 
of the class A,,; can be T-reduced to 0”), a X,-complete set. 
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15.4 Practical Consequences: Proving Incomputability 


The link between the jump hierarchy and the arithmetical hierarchy gives rise to yet 
another method for proving the undecidability of sets and, consequently, of decision 
problems. Recall that a decision problem D is represented by the corresponding for- 
mal language L(D), a subset of the standard universe L* (see Sect. 8.1.2). But L(D) 
is associated, via a bijection f : L* — N, with the set f(L(D)) C N (see Sect. 6.3.6). 
Now, if this set is arithmetical, i.e., f(L(D)) = {x € N| F(x)} with F (x) complying 
with Definition 15.2, then finding F (x) with minimal number n of quantifiers, such 
that f(L(D)) is at least in one of the classes Y, and I,, uncovers the degree of 
undecidability of the set L(D)—and, therefore, of the decision problem D. 

So, the method is as follows. Given a decision problem D, construct the sim- 
plest arithmetical description of the set f(L(D)), i.e., construct the predicate F (x) 
with minimal number n of quantifiers that complies with Definition 15.2 such that 
f(L(D)) = {x € N|F(2)}. 


The steps of the method are then as follows. 


Method. (Proof by Arithmetical Hierarchy) The degree of undecidability of 
a decision problem D can be found as follows: 


1. Consider the set L(D). 

2. Try to prove: For some n > 0, the set f(L(D)) = {x € N| F(x)}, where F(x) 
is either Jy, Vy2... QynR(x,y1,Y2,---,¥n) Or Vyisy2..-OynS(*,91,¥25--+,¥n)s 
and R,S are decidable relations. 

3. If step 2 succeeded and the obtained n is the minimal number satisfying 2, 
then f(L(D)) is at least in one of the classes X,, and Ip. 


We now give some examples. 


Example 15.4. (Halting Problem Dyay) The language of Dy is L(D yar) ={(T,w) | T halts on w} 
= Ko. We can rewrite it as Ko = {(T,w) | ds: T halts on w in at most s steps and returns a result}. 
The relation R(T, w,s) “7 halts on w in at most s steps and returns a result” is decidable. So, 
Ko ={(T,w) | SsR(T,w,s)}. The corresponding subset of N is Ko = {x | SsRyann(%? (x), 7 (x), 8) }, 
where 7, Ty are projection functions and Ry; is from Example 15.1. Thus, Ko € 2}. 


def 


Example 15.5. (Empty Proper Set, Dgnpy) The proper set of a Turing machine T is the set L(T) = 
{w € X*|T accepts w} (see Definition 6.8, p. 141), and the decision problem Dgmp asks whether 
or not L(T) = ® (see Sect. 8.3.1). The language of Dgmp is L(Demp) = Emp = {(T)|L(T) = 9}. 


We can restate it: Emp = {(T)|VwVs : T does not accept w in s steps}. The relation R(7,w,s) = 


“T does not accept w in s steps” is decidable (just run T on w for at most s steps). So, Emp = 
{(T)|VwVsR(T,w,s)}. The corresponding subset of N is Emp = {x|VtVsR(x,t,s)}, where x = 
f((T)) and t = f(w). The prefix VrVs can be contracted into Vy, where t = 1?(y) and s = 73(y), so 


Emp = {x|VyR(x, 77 (y),73(y))}. Hence, Emp € I). It remains to show that n = 1 is minimal. 


Example 15.6. (Finite Proper Set) This is the decision problem D = “Is L(T) finite?” Its lan- 
guage is L(D) = {(T)|L(T) is finite}. Clearly, L(7) is finite if and only if there is an up- 
per bound on the length of its words. That is, L(7) is finite iff there exists an ¢ such that, 
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for any w € L(T), |w| < &. Therefore, L(D) = {(T)|d@Vw(w € L(T) => |w| < £)}. Using the 
equivalence (A > B) = (=AVB), we obtain L(D) = {(T)|dw(w ¢ L(T) V |w| < 2}. A 
word is not in L(T) iff T does not accept it, regardless of the number s of steps performed. 
Hence, L(D) = {(T) |S@VwVs((T does not accept w in s steps) V |w| < £)}. Now define the rela- 
tion R(T,w,s,) = “(1 does not accept w in s steps) V |w| < 2” The relation R is decidable 
(run T on w for s steps, and if w has been accepted, compare |w| with £). Hence, L(D) can 
be expressed as L(D) = {(T)|S€VwVsR(T,w,s,€)}. After contracting VwVs into Vt we obtain 
L(D) = {(T) |3evtR(T, 2? (t), 23 (t),£)}. The corresponding subset of N is in E>. What remains 
to be shown is that n = 2 is minimal. 


Example 15.7. (Finite Function Domain, D#,) This is the problem “Is dom(@) finite?” Its lan- 
guage is Fin = {e|dom(@,(x)) is finite}. Leaning on Example 15.6, we can easily prove that 
Fin € Xy. What remains to be shown is that n = 2 is minimal. 


Example 15.8. (Function Totality, D7;,) “Is a p.c. function @ total?” This is the decision problem 
Dor. Its language is L(D7or) = Tot = {e|Vx@.(x) |}, or rewritten, Tot = {e|VxdsRyatr(e,x,5)}; 
see Example 15.1. Thus, Jot € Iz. What remains to be shown is that n = 2 is minimal. 


Figure 15.2 shows the initial part of the arithmetical hierarchy for n = 1,2,3,4. 
The corresponding Z,-complete sets are 0’, 0”, 0”, 0), and the IT,,-complete sets 
are 0’, 0”, 0”, 04). We mention without proof that some of these are the sets 
K, Fin, Inf, Cof, Tot (and their complements) that we introduced in Sect. 8.3.5 and 
used in Example 12.1 (p. 269). Actually, the following holds: 


0” =m Cof 0” =,, Cof 
0” =,, Fin 0’ =,, Tot=,» Inf 
0’ =, K 0 =, K 


Fig. 15.2 The lowest levels of the arithmetical hierarchy: (n= 1) 2 is the class of c.e. sets; IT 
is the class of co-c.e. sets; A; is the class of decidable sets; @’ is 2-complete; 0 is -complete; 
(n= 2) X) is the class of 0'-c.e. sets; I, is the class of @’-co-c.e. sets; A> is the class of @’-decidable 
sets; 0” is £>-complete; 0” is Th-complete; (n= 3) 5; is the class of @'-c.e. sets; [3 is the class of 
0" -co-c.e. sets; A3 is the class of @”’-decidable sets; 0” is £3-complete; 0” is T;-complete; (n= 4) 
&4 is the class of @’’-c.e. sets; II; is the class of 0’”-co-c.e. sets; Aq is the class of @”-decidable 
sets; 0 is Z4-complete; and 04) is Ty-complete 
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15.5 Chapter Summary 


A k-ary relation R is decidable if there is a Turing machine capable of deciding, 
for any given k-tuple of elements (a1,a2,...,a,), whether or not R(a1,a2,...,ax). 
A set A is arithmetical if it can be represented as A = {x € N|F(x)} such that, 
for some n > 0, the predicate F (x) = Sy Vyo5y3...QynR(x,y1,y2,---,)n) or F(x) = 
Vy dy2Vy3... OvnR(x,y1,92,---;Yn), and R is a decidable relation on N. The sets de- 
fined by the properties F (x) = dy; Vy2dy3... Qy,R(x,y1,2,---;Yn) are in the arith- 
metical class £,,, and the sets defined by F(x) = Vy; Sy2Vy3... OvnR(%,Y1,92,---;Yn) 
are in the arithmetical class IJ,,. The intersection of 2, and IJ, is the arithmetical 
class A,. The classes 2, and IT, are proper subclasses of both Y,4; and 7,41, and 
A, is a proper subclass of &,, and IT,,. A member A of 2, is 2,-complete if every 
member of Z,, is m-reducible to A; and a member B of IT, is I,-complete if every 
member of IT, is m-reducible to B. 

There is a connection between the arithmetical hierarchy and the jump hierarchy; 
for n > 0 the following hold: a) the set O"+) is Xn+1-complete ; b) the class X41 
consists of exactly the 0!”)-c.e. sets; and c) any member of A, is T-reducible to 
0”). In particular, A; contains decidable sets, 2; contains c.e. sets, and IT; contains 
co-c.e. sets. Thus, the jump hierarchy is interleaved with the arithmetical hierarchy. 

We can use the arithmetical hierarchy in proving the undecidability of a set (or 
a decision problem). To do this, we must express the set (or the language of the 
problem) as an arithmetical set. Then, its degree of undecidability is obtained from 
the index of the lowest arithmetical class that contains it. 


Problems 


Definition 15.5. (Graph of a Function) The graph of a partial function @ is the relation 
graph(@) defined by (x,y) € graph(@) => @(x) =y. 


15.1. Prove the following Graph Theorem. 


Theorem 15.4. (Graph Theorem) A partial function @ is p.c. <=> graph() is c.e. 


15.2. Prove: 
(a) Xp C Ym On, for any m > n. 
(b) Ty, © Lm Om, for any m > n. 
[Hint. Use Theorem 15.2.] 
15.3. Prove: 
(a) AEX, = Ax AEX, 
(b) A,B EX, = A-BEA +1. 
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15.4. Prove: 

(a) A,B EX, == AUBEX. 

(b) A,B EL, = ANBeE xy. 

(c) A,B eT, = AUBE Ih. 

(d) A,B ET, = ANBeE Ih. 

[Hint. Vy, dy2...R A Vz dz2...S = Vy Vz1dy25z2...(RAS).] 

15.5. Prove: 

(a) A<, BABES, = AEX. 

(b)A<,7 BA BET, = AE Ih. 


[Hint. Let x € A= f(x) € B where f is a computable function. If R(x,y1,...,yn) isa 
decidable relation, so is R(f(x),y1,---5Yn)-] 
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reader can also find the proofs that sets , Fin, Inf,Cof, Tot and their complements are Y,- or 
IT,-complete for small n. 


Part IV 
BACK TO THE ROOTS 


The crisis in the foundations of mathematics bequeathed fundamental questions: 
What is an algorithm? What is computing? What is computable? In Part I, we 
explained how these questions led to the birth of Computability Theory when the in- 
formally, intuitively understood concepts of “algorithm”, “computation”, and “com- 
putable function” were rigorously defined by the Computability (Church-Turing) 
Thesis. 

In Parts II and III we showed how this thesis opened the door to a mathemati- 
cal treatment of these intuitive concepts. In particular, we explained how the the- 
sis enabled the discovery of the universal Turing machine, which was soon phys- 
ically realized in the form of a general-purpose computer. Computability Theory 
continued its development in different directions and yielded many important theo- 
retical and practical discoveries about computation and its application. Simultane- 
ously, general-purpose computers kept improving and evolved into powerful com- 
puting machines capable of solving complex computational problems. Taking every- 
thing into account, the Computability (Church-Turing) Thesis brought about conse- 
quences that tremendously changed human knowledge and civilization. 

Realizing its immense importance, we will revisit the Computability (Church- 
Turing) Thesis in Part IV in much greater detail. There are good reasons for this 
decision: Due to the fast development of general-purpose computers and the recent 
rise of different proposals for new computing paradigms, some scientists have ques- 
tioned the adequacy of the thesis for the suggested paradigms, whether realistic (e.g., 
parallel or quantum computation) or speculative ones (e.g., hypercomputation). As a 
result, new versions of the thesis have recently emerged, each involving a certain no- 
tion or concept in the original thesis (e.g., physical implementability of algorithms, 
computational complexity, or physical computability). The necessity, adequacy, and 
relative power of each of the proposed versions are currently intensely discussed. 
Since these matters are in active discussion within the academic community, a well- 
grounded understanding of the Computability (Church-Turing) Thesis is needed in 
order to keep up with them. 


@ 
Chapter 16 ca 
Computability (Church-Turing) Thesis Revisited 


With this concept [Turing computability] one has for the first 
time succeeded in giving an absolute definition of an interesting 


epistemological notion, i.e., one not depending on the formalism 
chosen. (Gédel, 1946) 


Abstract The Computability (Church-Turing) Thesis formalized the informal no- 
tions of computation and enabled their mathematical treatment. The thesis is by 
many viewed as an unproved or even unprovable proposition, although it has been 
subjected to continuous examination. Nevertheless, the consequences of this break- 
through are immense. Recent developments in computer science have brought some 
similar theses. We describe in greater detail the evolution of the thesis from its be- 
ginnings to some of its modern versions, and review some of their open questions. 


16.1 Introduction 


Hilbert, who in 1899 gave an axiomatization of geometry, reduced the question of 
the consistency of geometry to the question of the consistency of arithmetic. To 
prove the latter, he proposed using the kind of reasoning, called finitism, that would 
only use finite mathematical objects and constructive methods, at least in princi- 
ple. Hilbert showed that finitism would reduce mathematical proving to mechanical 
manipulation of finite strings of symbols, free of deceptive intuitive meaning. ! 

The idea of mechanical manipulation of finite strings stimulated interest in 
“procedures” (or “algorithms”*) that could mechanically and systematically—in 
other words effectively—carry out such manipulations.* 

Another major part of Hilbert’s program concerned the Entscheidungsproblem, 
the problem of finding a “procedure” for deciding the validity of any mathemat- 
ical statement. With such a decision “procedure” available, we could answer any 
mathematical question in a purely mechanical manner, at least in principle.* 


' This would bring us a kind of mathematical paradise in which creative thinking would no longer 
be needed to pursue research. (This would have nothing to do with “Cantor’s paradise”, the situa- 
tion in mathematics just before paradoxes were discovered in Cantor’s naive set theory.) 

? We use the terms “algorithm” and “effective procedure” interchangeably. 

3 See Sects. 2.2.4, 3.1.1, 3.1.2, and 3.1.3. 

* See Sects. 4.1.1 and 4.1.2. 


© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, 315 
part of Springer Nature 2020 
R. Robi¢, The Foundations of Computability Theory, https://doi.org/10.1007/978-3-662-62421-0 16 


316 16 Computability (Church-Turing) Thesis Revisited 


Hilbert’s ideas were deeply founded on formal axiomatic systems that included 
First-Order Logic (based on Principia Mathematica) and Formal Arithmetic (based 
on Peano Arithmetic). Alas, in 1931 Godel published his Incompleteness Theorems 
about such formal axiomatic systems. He proved that if F is any such (consistent) 
system, then (i) there are undecidable propositions in F, and (ii) the consistency of 
F cannot be proved with F’s own means. Although Gédel’s discovery unexpectedly 
brought Hilbert’s program to a close® the fall of Hilbert’s program bequeathed new 
burning questions. These arose from the notion of a “procedure” that was used by 
Hilbert and Godel. 

The notion of a “procedure” was informal, understood intuitively as “a kind of a 
recipe, or plan, telling how a human can carry out a given task in a purely mechan- 
ical way.” But after Gédel’s discovery it turned out that a rigorous, mathematically 
precise, that is, formal definition of the concept of a “procedure” became indispens- 
able. Since a “procedure” could be viewed as a mapping of its inputs to its outputs,® 
a formal characterization of the notion of a “computable” function became indis- 
pensable too. 


NB In this chapter we use quotation marks to refer to human informal, intuitive 
understanding of the notions of computing. For example, “procedure”, “finite and 
mechanical procedure”, “effective procedure”, “computable” number, “compu- 
table” function, “computation”, “creative thinking”, “insight”, “intuition”, “inge- 
nuity”, “process”, and “argument” were at the time only intuitively (and impreci- 
sely) defined. The exceptions to this rule will be quoted passages from other sources. 


16.2 The Intuitive Understanding of the Notion of a “Procedure” 


Generally speaking, the term procedure means a usual or correct way of doing some- 
thing; it is a series of actions conducted in a certain prescribed order and manner 
to achieve something. The informal concept of a mathematical procedure, 1.e., the 
“procedure” that Hilbert and Gédel were concerned with, complied with this defini- 
tion, but it also bore its own characteristics. Namely, the concept of a “procedure” 
implicitly emerged with Euclid in ancient Greece, and gradually acquired, especially 
during the first two decades of the 1900s, specific additional connotations of some 
concepts relevant to mathematical problem solving. As a result, at the time of Hilbert 
and Gédel it became required that mathematical “procedures” were “effective”’. 

The intuitive, common-sense understanding of the idea of an “effective proce- 
dure” was formulated informally, such as in the following definition. 


5 See Sects. 4.2.3 and 4.2.5. 


© Accordingly, this mapping is treated extensionally, in the sense that it is determined by the set of 
all ordered pairs of its arguments and values, and not by the particular way in which it is defined. 
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Definition 16.1. (“Effective procedure”) A “procedure” for achieving some 
mathematical result is said to be “effective” if it 


1. consists of a finite number of exact instructions; 

2. produces the result in a finite number of steps, if carried out without error; 
3. can be carried out by a human aided only by paper and pencil; 
4 


. demands from the human no “creative thinking” (“insight, intuition, ingenuity”). 


The definition is informal because it doesn’t precisely tell us what the instructions 
are and what “creative thinking” is. This is not surprising: Humans were (and still 
are) perfectly aware that human understanding of the term “creative thinking” and its 
mental phenomena such as “insight”, “intuition”, and “ingenuity” are only founded 
on humans’ limited experience and deceptive intuition. No mathematically precise 


characterization of these terms seemed (and still seems) to be within human reach. 


16.3 Toward the Thesis 


The main protagonists of the search for a precise mathematical characterization of 
the idea of an “effective procedure” were Gédel, Church, and Turing; others also 
contributed, most notably Herbrand, Post, Kleene, and Rosser. In the following we 
will describe in greater detail the events following the year 1931 through which 
these young’ mathematicians and logicians endeavored to answer the seemingly 
undemanding questions: 


What exactly is a mathematical “procedure” ? 
What exactly is an “effective procedure” ? 
When exactly is something “computable” ? 


And after all, What exactly is “computing” ? 


16.3.1 Gédel 


After publishing his famous Incompleteness Theorems in 1931, Godel continued his 
research aiming to extend the theorems to formal mathematical systems in general. 
In the course of this research he realized that a precise definition of “computable” 
functions and “procedures” that compute the values of such functions was necessary 
in order to mathematically secure both his general definition of a formal axiomatic 
system and the generalization of his incompleteness theorems. 


7 Tn 1931, Turing was 19, Kleene 22, Herbrand 23, Rosser 24, Godel 25, Church 28, and Post 34. 
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Godel at Princeton 


In 1934, during the period February through May, Gédel delivered a series of lec- 
tures at the Princeton Institute for Advanced Study, where he explicated undecidable 
propositions of formal axiomatic systems in general. He started with the following 
definition of a formal axiomatic system: 


We require that the rules of inference, and the definitions of meaningful formulas and ax- 
ioms, be constructive; that is, for each rule of inference there shall be a finite procedure for 
determining whether a given formula B is an immediate consequence (by that rule) of given 
formulas A;,...,Ay, and there shall be a finite procedure for determining whether a given 
formula A is a meaningful formula or an axiom. 


In this passage he assumed the existence of two “procedures”, the first one for de- 
ciding whether or not a given formula directly follows from other given formulas, 
and the second one for distinguishing axioms from other well-formed formulas. 
In addition to being “finite”, the “procedures” should be “mechanical”, in the sense 
he explained for the first “procedure” a year earlier: 


[The] outstanding feature of the rules of inference [is] that they are purely formal, i.e., refer 
only to the outward structure of the formulas, not to their meaning, so that they could be 
applied by someone who knew nothing about mathematics, or by a machine. 


Godel was of course well aware that an irreproachable development of his ensuing 
generalized incompleteness theorems would require an exact characterization of the 
concept of the “finite and mechanical procedure”. As he later explained: 


When I first published my paper about undecidable propositions the result could not be pro- 
nounced in this generality, because for the notions of mechanical procedure and of formal 
system no mathematically satisfactory definition had been given at that time. [...] The es- 
sential point is to define what a procedure is. 


But at the time of his 1934 Princeton lectures he had no such characterization. 
Neither did he have a precise idea of what could be “computed” by such “proce- 
dures”. Indeed, when speaking of functions—before he introduced the new con- 
cept of a general® recursive function—he reviewed the primitive? recursive func- 
tions, which he introduced in 1931, and stated an easy observation that primitive 
recursive functions are “computable” (can be “computed” by “finite and mechani- 
cal procedures’): 


[Primitive] [r]ecursive functions have the important property that, for each given set of val- 
ues of the argument, the value of the function can be computed by a finite procedure. 


To this, however, he added a footnote stating a corresponding hard observation, 
namely that to show the converse (i.e., that the “computable” functions are general 
recursive, or recursive of the most general kind) would be quite a different matter: 


8 Gédel developed general recursive functions from Herbrand’s ideas; see Sect. 5.2.1, p. 84. 
° See Sect. 5.2.1, p.81. 
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The converse seems to be true, if, besides [primitive] recursions [...] recursions of other 
forms (e.g., with respect to two variables simultaneously) are admitted. This cannot be 
proved, since the notion of finite computation is not defined, but it serves as a heuristic 
principle. 


Gédel’s footnote indicates that he believed that the class of functions obtainable by 
recursions of the most general kind (possibly the general recursive functions) was 
the same as the class of “computable” functions. However, as he clarified later, at 
the time of the Princeton lectures he was not at all convinced that his and Her- 
brand’s concept of general recursion encompassed all possible recursions; in addi- 
tion, at the time of his lectures the equivalence between his general recursiveness 
and [-recursiveness, which was then investigated by Kleene (see below), was not 
quite trivial. Thus, Gédel’s above footnote did not anticipate Church’s Thesis (which 
will be described shortly). 


In sum, in 1934 Godel had the concept of general recursiveness, but he was not 
convinced that general recursiveness is the recursiveness of the most general kind, 
nor did he believe that it could be rigorously proved that this recursiveness captures 
the informal notion of functions “computable by finite and mechanical procedures”’. 


16.3.2 Church 


At Princeton three years earlier, Church started an attempt to develop a logical sys- 
tem in which the notion of a function would play a fundamental role. In the context 
of the system, the A-notation!® arose in a very natural way. In this notation he de- 
fined the well-formed expressions (A-terms) and the operations (8 -reductions) that 
transform A-terms without altering their meanings. Church identified positive inte- 
gers with certain A-formulas, called Church numerals. Then he defined a function of 
positive integers to be A -definable if the function is representable by a A-term that is 
B-reducible to a Church numeral exactly when the numeral represents the function’s 
value at a given number represented by the corresponding Church numeral. In 1931, 
he had just two A-definable functions, the successor function o(n) =n-+ 1 and the 
sum function m-+n (and, in a certain way, the functions m—n and m x n). 

It was obvious that every A-definable function is “computable”—or, in Church’s 
wording, “effectively calculable’—-since the “effective procedure” for obtaining the 
function’s values is implicit in the function’s A-term. But Church was speculat- 
ing that also the converse may be true, i.e., that every “computable” function is 
A-definable. So he set his doctoral student Kleene to investigate the A-definability 
of some particular “computable” functions. Kleene exceeded Church’s expectations: 
By 1934, he proved that all the usual “computable” number-theoretic functions are 
A-definable. 


10 See Sect. 5.2.1, p. 85. 
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Church’s Thesis (1934) 


Encouraged by this evidence and by his own intuition, Church presented to G6del— 
who was at the time at Princeton giving his lectures as mentioned above—a thesis 
on “computable” functions. In the thesis he proposed that A -definability of functions 
be adopted as the formal characterization of their “computability”. In other words, 
Church proposed that function “computability” simply be identified with (or, defined 
as) function A-definability. Here is his thesis. 


Church’s Thesis (1934) A function is “computable” iff it is A-definable. 


If we denote by F the informal class of all “computable” functions, by C the 
class of all Church’s A-definable functions, and by a := b the assignment opera- 
tion “a becomes b”, then Church’s Thesis postulated that 


Fi=C€. 


Remark. Kleene, who also heard of the thesis, was doubtful of it. He tried right away to disprove 
it by diagonalization.!! However, he unexpectedly found that the diagonalization procedure failed 
to produce a contradiction. The surprising incapability of diagonalization in the case of Church’s 
Thesis sufficed to convince Kleene and turned him into a supporter of the thesis. 


Doubts about Church’s Thesis (1934) 


Godel, by contrast, was not convinced by the available evidence, and he rejected 
Church’s Thesis. Church described this event in a letter to Kleene: 


In regard to Gédel and the notions of recursiveness and effective calculability, the history 
is the following. In discussion with him the notion of lambda-definability, it developed that 
there was no good definition of effective calculability. My proposal that lambda-definability 
be taken as a definition of it he regarded as thoroughly unsatisfactory. 


Church’s immediate reply to Gédel was that they further test A-definability: 


Ireplied that if he would propose any definition of effective calculability which seemed even 
partially satisfactory I would undertake to prove that it was included in lambda-definability. 


However, Gédel’s counter-suggestion to Church was to focus on “computability”: 


His [Gédel’s] only idea at the time was that it might be possible, in terms of effective 
calculability as an undefined notion, to state a set of axioms which would embody the 
generally accepted properties of this notion [effective calculability], and to do something 
on that basis. 


'l See Sect. 5.3.3, p. 100 and Sect. 9.1. 
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We can see that Gédel’s belief was that (i) the notion of “computability” should 
be better understood by further investigations; (ii) if some of the discovered prop- 
erties of “computability” became generally accepted, then these properties could 
serve as postulates of an informal theory of “computability”; and (iii) such a theory 
might reveal an appropriate formalization of the notion of “computability”. 


Church’s Thesis (1936) 


Church, however, went on to publicly announce his thesis. In 1935, he gave a talk 
at a meeting of the American Mathematical Society, saying: 


Following a suggestion of Herbrand, but modifying it in an important respect, Godel has 
proposed (in a set of lectures at Princeton, N.J., 1934) a definition of the term [general] re- 
cursive function, in a very general sense. In this paper a definition of the [general] recursive 
function of positive integers which is essentially Godel’s is adopted. And it is maintained 
that the notion of an effectively calculable function of positive integers should be identi- 
fied with that of a [general] recursive function, since other plausible definitions of effective 
calculability turn out to yield notions which are either equivalent to or weaker than recur- 
siveness. 


In 1936, he published a paper in the American Journal of Mathematics stating: 


[In §1:] The purpose of the present paper is to propose a definition of effective calculability 
which [the definition] is thought to correspond satisfactorily to the somewhat vague intuitive 
notion [effective calculability] in terms of which problems [...] are often stated [... ] 


[In footnote 3:] As will appear, this definition of effective calculability can be stated in either 
of two equivalent forms, (1) that a function of positive integers shall be called effectively 
calculable if it is A-definable [...], (2) that a function of positive integers shall be called 
effectively calculable if it is [general] recursive [... ] 


[In §6:] THEOREM XVI. Every [general] recursive function of positive integers is A- 
definable. THEOREM XVII. Every A-definable function of positive integers is [general] 
recursive. [...] 


[In §7:] We now define the notion, already discussed, of an effectively calculable function 
of positive integers by identifying it with the notion of a [general] recursive function of pos- 
itive integers (or of a A-definable function of positive integers). This definition is thought 
to be justified by the considerations which follow, so far as positive justification can ever be 
obtained for the selection of a formal definition to correspond to an intuitive notion. [... ] 


The reader may have noted that in the above announcements Church formu- 
lated his thesis in terms of Gédel-Herbrand’s (general) recursiveness, rather than 
in terms of his own A-definability. Why did he do so? There were three reasons 
for this change: (i) according to Kleene, there were rather chilly receptions from 
audiences around 1933-1935 to disquisitions on A-definability; (ii) Church and 
Kleene had each already proved that A-definable functions are (general) recur- 
sive (see THEOREM X VJ); (iii) in 1936, Kleene also proved that (general) recursive 
functions are A-definable (THEOREM XVII). Therefore, at the time of submitting 
his paper, Church already knew that A-definability and (general) recursiveness are 
formally equivalent. The reader should also take note that Church’s “computability” 
as defined in the above passage refers to functions of positive integers only. 
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Accordingly, Church proposed in effect the following new version of his thesis. 


Church’s Thesis (1936) A function of positive integers is “computable” iff 
it is (general) recursive (or, equivalently, A -definable). 


If we denote by F* the informal class of all “computable” functions of positive inte- 
gers, by GT the class of all Gédel’s (general) recursive functions of positive integers, 
and by C* the class of all Church’s A -definable functions (of positive integers), then 
the second version of Church’s Thesis is 


Ft = gt (=C?). 


This version too postulates that the vaguely defined class F* coincides with the 
precisely defined class Gt (and with Ct due to the proved confluence Gt = CT). 


Doubts about Church’s Thesis (1936) 


We have seen that, in his second thesis, Church defined the class Ft of “computable” 
functions of positive integers to coincide with a particular class of functions, G* (or 
C*). Post, however, was greatly opposed to speaking of Church’s Thesis as a defi- 
nition; he viewed the thesis as a working hypothesis, which needs to be continually 
verified. Namely, for any new function f € Ft that one might conceive, one would 
still need to prove that f € Gt (or f € Ct), because f’s membership in GT (or C*) 
does not logically follow by (or from) Church’s Thesis, which is just an arbitrarily 
postulated definition. On account of this, Post criticized Church for masking this 
hypothesis in the guise of a definition: 


And to our mind such [working hypothesis] is Church’s identification of effective calcula- 
bility with recursiveness. [...] Actually the work already done by Church and others carries 
this identification considerably beyond the working hypothesis stage. But to mask this iden- 
tification under a definition hides the fact that a fundamental discovery [...] has been made 
and blinds us to the need of its continual verification. 


Gédel too remained reluctant to accept G* = CT, the confluence of (general) 
recursiveness and A-definability, as decisive new evidence for Church’s Thesis. He 
was still demanding a deeper understanding of the ideas of “computable” function 
and “effective procedure” computing it, an understanding that would help to identify 
those properties of the two ideas that could be generally accepted as characteristic 
of them. 
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16.3.3 Kleene 


Kleene was Church’s doctoral student at Princeton who importantly contributed to 
the birth and development of Computability Theory. In 1935-1936, Kleene discov- 
ered his own formulation of the (general) recursive functions. How did he do that? 
First, in 1934 he found that Gédel’s primitive recursiveness can be augmented by 
a new rule of function construction, which he called the W-operation. The func- 
tions constructible in this way are now called the p-recursive functions.'* Then 
Kleene discovered his Normal Form Theorem, which asserts that every (general) 
recursive function @ can be constructed from two primitive recursive functions T 
and U (where T is a predicate) by a single application of the [-operator. 


Theorem 16.1. (Normal Form Theorem) There is a primitive recursive predicate 
T (e,x,y) and a primitive recursive function U(y) such that, for any (general) 
recursive function @(x), anumber e can be found such that p(x) = U(uyT (e,x,y)). 


Remark. Here the predicate T(e,x,y) is true iff y is the code of some computation of the value 
(x) and e represents the system &(@) of equations defining the function @ (see Sect. 5.2.1, p. 84). 
The function U extracts the value @(x) from y. Thus, the theorem states that the value p(x) of any 
(general) recursive function @ can be computed as follows: Find the shortest code y in the set of 
all encoded computations of the value @(x)—here the function @ is defined by the system &(@) of 
equations whose index is e—and then extract the computed value @(x) from y. 


By the Normal Form Theorem every (general) recursive function is [-recursive. 
Kleene also proved the converse: Every U-recursive function is (general) recursive. 
So he proved the equivalence of (general) recursiveness and his f-recursiveness, 
1.€., 


Gt = Kr, 


where we denote by Ct the class of all u-recursive functions (of positive integers). 


16.3.4 Rosser 


Rosser was Church’s doctoral student at Princeton who importantly contributed to 
the development of the A-calculus. In collaboration with Kleene he proved that the 
original A-calculus, which Church introduced in 1932, was inconsistent.!3 In 1936, 
in collaboration with Church, he proved the Church-Rosser Theorem,'* which states 
that if a A-term can be B-reduced to two different A-terms, then there is a A-term to 
which both A-terms can be B-reduced. This means that if two different computations 
of a A-defined function terminate, they return virtually equal results. Besides this, 
Rosser also made important discoveries in other fields of mathematics. !> 


!2 See Sect. 5.2.1, p. 81. 

'3 Nevertheless, the A-calculus later developed into a calculating system that now has strong impact 
on some fields of computing, e.g., functional programming languages. 

'4 See Box 5.2, p. 86. 

'S Rosser proved that the requirement for @-consistency in Gédel’s First Incompleteness Theorem 
can be weakened to the requirement for usual consistency. (See Sects. 5.2.1, p. 85 and 4.2.4, p. 66.) 
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16.3.5 Post 


At the age of twelve, Post lost his left arm below the shoulder in an accident when 
he reached for a lost ball under a parked car and a second car crashed into it. This 
changed his childhood ambitions from astronomy to mathematics. After receiving 
a B.S. in mathematics from City College, he was a doctoral student at Columbia 
University from 1917 to 1920. In this time, he became interested in modern mathe- 
matical logic; so he completed his doctoral studies with a dissertation in this area. 
A shortened version of the dissertation was published in 1921, with its main result 
being the first published proof of completeness and decidability of the propositional 
calculus of Principia Mathematica (PM).'° In 1920-1921, Post held a postdoctoral 
fellowship at Princeton University. In the course of the fellowship, independently of 
Hilbert’s formalist program he created his own research project. It was as part of this 
project that Post made astounding discoveries in logic that anticipated the work that 
was done on incompleteness and undecidability a decade and a half later in differ- 
ent ways by Godel, Church, Turing, and others.!7 Alas, the excitement caused by 
his discoveries precipitated Post’s first attack of bipolar disorder, a lifelong condi- 
tion that heavily affected him. To avoid undue excitement, he developed a routine 
that restricted his time spent on research to just three hours a day (4—5 p.m. and 7— 
9 p.m.). Due to this regimen and illness, he failed to promptly publish his ideas; 
indeed, it was only after other researchers had caught up with his discoveries that 
he became able to publish. Thus he published—with his wife assisting him by typ- 
ing his papers and letters—from 1935 to his death in 1954—some of his most re- 
markable papers. Since Post’s point of view and ideas were substantially different 
from the approaches of other logicians, these papers are highly original and influ- 
ential. For example, in contrast to other researchers, Post never emphasized “com- 
putable functions”. So when he read the paper presenting Church’s Thesis (1936), 
he retained the point of view adopted in his earlier work (f.a.s. expressed as rules 
for string manipulation). Accordingly, he chose to analyze the intuitive notion of 
a “computing process’, rather than that of a “computable function’. As a result, he 
proposed in 1936 his model of computation, called the finite combinatory process.'8 


16 Tn this paper (see [180]), Post stressed that the authors of the PM adopted a restricted view and 
methods whereby they could develop the logical foundation of mathematics in a fixed axiomatic 
system. This resulted in a logical formalism that lacks universality and avoids any considerations of 
the system itself. To recover generality, Post adopted a metamathematical view of the PM. He de- 
veloped a logical method (similar to formalism) by allowing mostly constructive methods (though 
he didn’t explicitly restrict to finitary methods). Conforming to this view and methods, he then 
introduced the now familiar truth-tables and truth-table method. Then he proved his Fundamental 
Theorem, which states that a propositional function is assertible iff it is a tautology. This provided 
both a completeness theorem and a decision procedure for the propositional calculus of the PM. 

'7 Tn 1920-1921 (see [182]), Post investigated abstractions of the classical propositional logic, 
which he called the postulate systems. He defined the notion of reduction of a system to another 
system; then he defined a series of reductions of systems through which he reduced a decision 
problem to a system in normal form (an extremely simple postulate system); next, by applying a 
diagonalization argument he discovered that the decision problem for systems in normal form is 
unsolvable; from this he finally deduced his incompleteness results. 


'8 Today, the standard terminology for the model is Post machine. 
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Finite Combinatory Processes 


A computational problem (called a general problem by Post) is a class of problem 
instances (called specific problems.) The computation of the solution to a given 
problem instance is carried out by what Post called the problem solver (or worker). 
The idea is that the problem solver has at his disposal a workspace and a program, 
which Post called the symbol space and the set of directions, respectively. 

The workspace is a two-way infinite sequence of boxes where the work leading 
from a problem instance to its solution is carried out. A box can be either empty or 
marked (i.e., contains a single symbol, say the stroke |). One box is singled out and 
called the starting box (starting point by Post). 

The problem solver is assumed to be capable of moving and working in his 
workspace with the proviso that he can observe and act on just one box at a time. 
He can perform any of the following five primitive acts: (a) mark the observed 
empty box; (b) erase the mark in the observed marked box; (c) move to the box 
on his right; (d) move to the box on his left; and (e) determine whether the observed 
box is marked or empty. The problem solver does not act freely; he is assumed to 
strictly follow the associated program and perform the prescribed primitive acts. 

The program is identical for all instances of the problem and consists of a head- 
ing and a body. The heading instructs the problem solver to observe the starting 
box and follow instruction 1 of the body. The body consists of a finite sequence of 
instructions (called directions by Post) numbered 1,2,3,...,s. The ith instruction 
is of one of the following three forms: 


e i:03 ji ... perform primitive act 0; € {a,b,c,d} and follow instruction j;; 
e i:0; 35 3/ ... perform primitive act 0; = e and depending whether the answer 

is YES or NO follow instruction j5 or j//, respectively; 
e@ Stop. 


How are Post’s ideas related to Church’s and Gédel’s attempts at formalizing the 
notion of a “computable” function? Let P be a particular problem solver (with a fixed 
program). Post observed that P induces a function fp : N > N defined as follows. If 
a given input n € N is unary represented by a sequence of n marked boxes, starting at 
the starting box, and P’s program halts on input n, leaving on P’s workspace exactly 
m & N marked boxes, then let fp(n) = m. Next he defined: A function f for which 
there exists a problem solver P such that f = fp is said to be Post-computable.'° 
Finally, he anticipated: 


The writer expects the present formulation [Post computability] to turn out to be logically 
equivalent to recursiveness in the sense of the Gédel-Church development. 


If we denote by P* the class of Post-computable functions (of positive integers), 
then Post foresaw the equivalence of general recursiveness and his computability, 
that is, 

Gt a pt 


'° Tn fact Post called such a function the 7-function, a term based on the notions in his 1936 paper. 
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16.3.6 Turing 


In England, independently and unaware of the related research being done at the 
time at Princeton, Turing characterized the informal notions of human “computa- 
tion” in terms of an abstract computing machine. In short, his principal idea was to 
show that any “process” that a human could carry out during his “computation” can 
be analyzed into a succession of simple operations of a particular abstract comput- 
ing machine. Turing’s approach to such a characterization of “computation” was to 
(i) start with an ideal human? carrying out a “computation” of a number, (ii) elimi- 
nate all the irrelevant details of the relevant observable “processes” making up such 
human “computations”, and (iii) carry out the obtained simplified “processes” by his 
machine. In this way Turing gave arguments that his machines could compute any 
“computable” number and, consequently, the value of any “computable” function. 

The machine which Turing conceived for simulating human “computation” he 
called the a-machine (for automatic machine).”! We will call it the Turing machine. 
In 1936, Turing published his ideas and their highly influential applications”? in the 
Proceedings of the London Mathematical Society. 

We now comment on some of the sections of this paper in which Turing devel- 
oped his characterization of the basic notions of computing. 


NB From now on and until p. 333 the subject will be “computation” of numbers. 


Turing’s Thesis (numbers) 


Turing formulated two theses. In the first, which we call Turing’s Thesis (numbers), 
he compared the numbers computed by his machine with the numbers “computed” 
by a human; in the second thesis, called Turing’s Thesis (operations), he compared 
the operations that make up machine computations of numbers with the “operations” 
which make up human “computations” of numbers. From the two theses the usual 
Turing’s Thesis (functions) can easily be synthesized. This thesis links the functions 
whose values can be computed by a Turing machine with those whose values can 
be “computed” by a human. We now start with the first thesis. 


20 A human is ideal because his computation is not bounded by practical limits in time, space, or 
resources. An ideal human is error-free, immortal, free of boredom and fatigue, and not troubled 
by insufficiency of paper, pencils, or any other simple tool needed in his computation. 
2! Later, after some Post’s practical adaptations, the a-machine was renamed the Turing machine. 
- Using his ideas, Turing discovered the universal computing machine [§86,7] (see Sect. 6.2); 
proved the undecidability of the Halting Problem [§8] (see Sect. 8.2) and the unsolvability of the 
Entscheidungsproblem [§11] (see Sect. 9.2.4); and outlined a proof that, over positive integers, his 
computability and Church’s A-definability are equivalent [Appendix]. Regarding the Appendix, 
Kleene later wrote: 

Turing learned of the work at Princeton on A-definability and general recursiveness just as 


he was ready to send off his manuscript, to which he then added an appendix outlining a 
proof of the equivalence of his computability to A-definability. 
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Turing-computable numbers. Initially, Turing defined a new notion, that of a 
computable number, which we will right away rename a Turing-computable number. 
Here is the definition: 


According to my definition, a number is [Turing-] computable if its decimal [representation] 
can be written down [digit by digit] by a machine. 


So Turing-computable numbers are those whose decimal representations can be 
generated progressively, digit by digit, by a machine. At this point, Turing said noth- 
ing about the machine that would be capable of writing down the numbers being 
computed by it. This he would do shortly and there it would become clear that some 
Turing-computable numbers can be real, not just positive integers (as was the case 
with Church’s Thesis). 


Next, Turing involved in his consideration the (intuitively understood) “computable” 
numbers. Actually, he called them numbers “which would naturally be regarded as 
computable.” (Clearly, these numbers coincide with Church’s “effectively calcula- 
ble” numbers and also with Gédel’s numbers “computable by finite and mechanical 
procedures.”) 


Finally, Turing declared his intention to show that the set of all Turing-computable 
numbers, 7, contains the set NV of all “computable” numbers.2? Here is his claim: 


In [later sections] I give some arguments with the intention of showing that the [Turing-] 
computable numbers [set 7 ] include all numbers which would naturally be regarded as [hu- 
manly] computable [set \V]. 


This claim is today sometimes called Turing’s Thesis (in terms of numbers). 


Turing’s Thesis (numbers) The set T of Turing-computable numbers contains 
the set N of “computable” numbers; that is, N CT. 


Turing’s Thesis (operations) 


Turing machine. In the next step, Turing defined an abstract mechanical machine 
that would be capable of writing down progressively, digit by digit, decimal repre- 
sentations of the numbers being computed by it. Here is the passage of his paper 


describing the workings of the machine:7+ 


23 So N is the set of “computable” numbers and F is the class of “computable” functions. 
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4 Instead of Turing’s original terms “m-configuration”, “square”, and “configuration” we now use 
the terms “state”, “cell”, and “the pair (current state, currently scanned symbol)’, respectively. 
In the passage, we indicate the present-day terms in square brackets. 
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[(i)] We may compare a man in the process of computing a real number to a machine which 
is [(ii)] only capable of a finite [not infinite] number of conditions [...] which will be 
called “m-configurations” [states]. The machine is supplied with a “tape” (the analogue of 
paper) running through it, and divided into sections (called “‘squares”) [cells] each capa- 
ble of bearing a “symbol”. [(iii)] At any moment there is just one [bounded number] 
square [cell] [...] bearing the symbol [...] which is “in the machine” [currently scanned]. 
[(iv)] The “scanned symbol” is the only one of which the machine is [...] “directly aware.” 
However, by altering its m-configuration [state] the machine can effectively remember some 
[just finitely many] of the symbols which it has “seen” (scanned) previously. [(v)] The 
possible behaviour of the machine at any moment is determined by the m-configuration 
[state] [...] and the scanned symbol. [...] This pair [the pair (current state, currently 
scanned symbol)] [...] will be called the “configuration”. In some of the configurations 
in which the scanned symbol is blank [...] the machine writes down a new symbol on the 
scanned square [cell]: in other configurations it erases the scanned symbol. The machine 
may also change the square [cell] which is being scanned, but only by shifting the tape one 
[bounded number] place to the right or left. In addition to any of these operations the m- 
configuration [state] may be changed. __[(vi)] Some of the symbols written down will form 
the sequence [...] which is the decimal of the real number [not just positive integer] which 
is being computed. [(vii)] The others are just rough notes [e.g., intermediate results] to 
“assist the memory”. It will only be these rough notes which will be liable to erasure. 


In the following comment on the quoted passage, we will include in several 
places Turing’s tacit assumptions about humans and their “computations”. These 
assumptions do not explicitly appear in the quoted passage. Turing explicated them 
later, so that the motivation for his particular definition of the machine clears up 
after reading the rest of the paper. 

So, in the quoted passage, Turing revealed several issues: (i) In defining this ma- 
chine, Turing had in mind a human equipped with a pen and paper in the process 
of “computing”. (ii) He attributed to the machine a finite number of memory 
states (tacitly assuming that human memory is limited) and a one-dimensional tape 
divided into cells each capable of containing a symbol (tacitly assuming that the 
usual two-dimensional paper used by a human could easily be replaced, without 
loss of generality, by one-dimensional paper tape). (iii) Turing defined that, at 
any step of the computation of his machine, there is just one cell on the tape whose 
symbol is currently scanned by, and hence known to, the machine (tacitly assum- 
ing that humans too can observe, and be aware of, only finitely many symbols at 
atime). (iv) Turing pointed out that finitely many previously scanned symbols 
could be memorized by encoding them in the machine’s states.*>_ (v) Furthermore, 
the action (i.e., local behavior”®) of the machine must, at any step of its computa- 
tion, be deterministically determined by the pair consisting of the current state of 
the machine and the symbol which the machine currently scans. Based on this, the 
machine makes three simple operations: (a) erases the scanned symbol or writes a 
new one to the scanned cell; (b) shifts the tape to one of the two adjacent cells; 
and (c) changes its state. (Here Turing tacitly took into account that a human too 
decides, depending on his current “state of mind” and the symbols he currently ob- 
serves, how to deal with these symbols; which not too distant symbols on the paper 


25 See Sect. 6.1.2, p. 117 and Sect. 6.1.3, p. 119. 
26 See Sect. 7.2. 
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to observe next; and how to prepare his “state of mind” for dealing with them.) (vi) 
If the computation of the machine terminates, then the number computed by the ma- 
chine is left on the tape as a sequence of symbols. Note that the computed number 
can be real, not just a positive integer. (vii) During the computation, the machine 
can also use its tape for writing down temporary results and other notes. In this way 
the machine can avoid memorizing (finitely many) tape symbols by encoding them 
in its states. (viii) Finally, the defined machine is finite, in the sense that it uses, at 
any step of its computation, only finite means: working components of finite size, 
finitely many states, finitely many cells, and finitely many symbols; a fixed bounded 
number of currently scanned symbols (actually just one); and a fixed bound on the 
maximal magnitude of tape shifts (actually by just one cell).2” Moreover, the ma- 
chine operates in a purely mechanical way, in the sense that its local behavior is 
determined exclusively by its program consisting of a finite set of indivisible in- 
structions whose execution is purely mechanical (involving no “creative thinking’). 


Machine operations. Next, Turing claimed that the operations used by the Turing 
machine encompass all the “operations” that a human uses in the “computation” of 
a number: 


It is my contention that these operations [of the Turing machine] include all those which are 
used [by a human] in the computation of a number. 


This claim is today sometimes called Turing’s Thesis (in terms of operations). 


Turing’s Thesis (operations) Operations of the Turing machine include all the 
“operations” that a human uses in the “computation” of a number. 


Here Turing reached the point where the two theses should be proved.?® We 
describe how he overcame the difficulties he was faced with in his search for a proof. 


Justification of Turing’s Thesis (numbers) 


Could Turing rigorously prove that Turing-computable numbers make up a set 7 
that contains the set N of all “computable” numbers? The usual way of proving 
the relation VV C T would be to prove that x € VV implies x € 7, for every x. In 
Turing’s case, he should prove that any “computable” number x € NV is also Turing- 
computable, x € 7. This, however, wouldn’t work in Turing’s favor. The reason was 
that the informal notion of a “computable” number might implicitly involve vaguely 
understood, informal ingredients of human mental “processes”, such as “creative 
thinking” (“insight’, “intuition”, and “ingenuity”). But what exactly is “creative 


27 Turing’s and Post’s ideas bear strong resemblance but they pursued their research independently. 
28 In fact, the theses were stated in §80,1, clarified in §§2,3,4,5 and justified in §§9,10 of the paper. 
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thinking” or “insight”, “intuition”, and “ingenuity”? How can we know when ex- 
actly none of them is present in a human “computation”? That is, when exactly does 
a human “compute” a number x in a purely mechanical way, so that x € N beyond 
any doubt? Turing should have known precise answers to these questions in order 
to rigorously prove the relation x € N = x € T, because the machine he defined 
has no “insight”, “intuition”, “ingenuity” or any other kind of creative thinking. 
But, of course, he was well aware that answers to these questions would necessar- 
ily be intuitive and imprecise, and so too would be the proof of xe N>x eT. 
In his words:?° 


No attempt has been made to show that the “computable” numbers [set 7] include all 
numbers which would naturally be regarded as computable [set \’]. All arguments which 
can be given are bound to be, fundamentally, appeals to intuition, and for this reason rather 
unsatisfactory mathematically. 


Justification of Turing’s Thesis (operations) 


What about Turing’s Thesis (operations)? We can demonstrate that this thesis implies 
Turing’s Thesis (numbers). (If a Turing machine were able to imitate every “opera- 
tion” used by a “computing” human, then it could compute anything the human can 
“compute”; since this could be a number, VV C 7 would follow.) So Turing focused 
on his second thesis. To prove it rigorously, he should prove that every “process” a 
human carries out in his “computation” of a number can be reduced to (or simulated 
by) a process carried out by the Turing machine. But to do the latter, Turing should 
better understand the “processes” involved in human “computation”. As he said: 


The real question at issue is “What are the possible processes which can be carried out [by 
a human] in computing a number?” 


However, human “process” is an informal idea lacking a precise characterization. 
Because of this, finding a rigorous proof that a Turing machine can simulate every 
“process” of a “computing” human seemed to be rather difficult, if not impossible. 


Human “operations”. In this situation, Turing reflected on how the “processes” 
carried out by a human in his “computation” could be convincingly analyzed to an 
extent that would allow a Turing machine to plausibly imitate them. So he proceeded 
with an informal analysis of the relevant “processes” carried out by a “computing” 
human. (He called such a human the computer.) Turing assumed that the human 
computer is on the one hand ideal (in the sense that he is not bounded by any practi- 
cal limits on the time, space, or other resources required for the “computation”’), yet 
still limited by nature (in his memory capacity and sensory capability) on the other. 
Here is Turing’s informal analysis: 


2° Warning: In this passage “computable” means Turing-computable, not informally computable. 
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[()] [Human] [c]omputing is normally done by writing certain symbols on paper. We may 
suppose that this paper is divided into squares like a child’s arithmetic book. [...] [T]he 
two-dimensional character of paper is no essential of computation. I assume then that the 
[human] computation is carried out on one-dimensional paper, i.e. on a tape divided into 
squares. [(ii)] I shall also suppose that the number of symbols which may be printed 
is finite. If we were to allow an infinity of symbols, then there would be symbols differing 
to an arbitrarily small extent. The effect of this restriction of the number of symbols is 
not very serious. It is always possible to use sequences of symbols in the place of single 
symbols. Thus an Arabic numeral 17 or 999999999999999 is normally treated as a single 
symbol. Similarly in any European language words are treated as single symbols. (Chinese, 
however, attempts to have an enumerable infinity of symbols.) The differences from our 
point of view between the single and compound symbols is that the compound symbols, if 
they are too lengthy, cannot be observed [by a human] at a glance. This is in accordance with 
experience. We cannot tell at a glance whether 9999999999999999 and 999999999999999 
arethesame. — [(iii)] The behaviour of the [human] computer at any moment is determined 
by the symbols which he is observing, and his “‘state of mind” at that moment. [(iv)] We 
may suppose that there is a bound B to the number of symbols or squares which the [human] 
computer can observe at one moment. If he wishes to observe more, he must use successive 
observations. [(v)] We will also suppose that the number of states of mind which 
need to be taken into account is finite. The reasons for this are of the same character as 
those which restrict the number of symbols. If we admitted an infinity of states of mind, 
some of them will be “arbitrarily close” and will be confused. Again, the restriction is not 
one which seriously affects [human] computation since use of more complicated states of 
mind can be avoided by writing more symbols on the tape. [(vi)] Let us imagine 
the operations performed by the [human] computer to be split up into “simple operations” 
which are so elementary that it is not easy to imagine them further divided. _[(vii)] Every 
such operation consists of some change of the physical system consisting of the [human] 
computer and his tape [paper]. We know the state of the system if we know the sequence 
of symbols on the tape, which of these are observed by the [human] computer [...], and 
the state of mind of the [human] computer. [(viii)] We may suppose that in a simple 
operation not more than one symbol is altered. Any other changes can be split up into 
simple changes of this kind. The situation in regard to the squares whose symbols may be 
altered in this way is the same as in regard to the observed squares. We may, therefore, 
without loss of generality, assume that the squares whose symbols are changed are always 
“observed” squares. Besides these changes of symbols, the simple operations must include 
changes of distribution of observed squares. The new observed squares must be immediately 
recognizable by the [human] computer. I think it is reasonable to suppose that they can only 
be squares whose distance from the closest of the immediately previously observed squares 
does not exceed a certain fixed amount. Let us say that each of the new observed squares is 
within L squares of an immediately previously observed square. [... ] 


This analysis isolated several issues concerning the “processes” and “operations” 
that compose human “computing”: (2) A human “computes” a number by using a 
pen and paper of finite size divided into equally sized squares to contain symbols. 
This is his working space. Of course, when necessary, he can add an additional 
finite amount of paper. Turing noted that the usual two-dimensional paper can be 
replaced by one-dimensional paper tape, which is also divided into equally sized 
squares and potentially infinite: When necessary, additional paper tape with finitely 
many squares can be added to the right or left end of the paper tape. The transi- 
tion to a one-dimensional working space is not an essential limitation since two- or 
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higher-dimensional arrays of squares can be represented by one-dimensional ones.*? 


Note also that a jump to a nonadjacent square in a two- or higher-dimensional 
paper can be simulated by a finite sequence of moves to adjacent squares in the 
one-dimensional paper tape. Thus, at any moment of his “computation”, the work- 
ing space (paper tape) of a human is a finite sequence of squares that can con- 
tain certain symbols which he has previously written, at most one symbol to a 
square. (ii) Turing then justified that the human needs only finitely many sym- 
bols to carry out a “computation”. For if there were uncountably many symbols, 
then—because of the finite size of the square—some of them would be too sim- 
ilar to be distinguishable; even if there were only countably infinitely many, say 
a1,42,a3,..., then of course they could be encoded by just two symbols, say s and 
', so that ay = s',a2 = s",a3 =5"",... but the problem of distinguishing between 
them would remain. (iii) At any step of the “computation”, the human action is 
determined by (a) the symbols in the currently observed squares, and (b) his current 
“state of mind”.>! (iv) Due to his limited sensory capability, there must be a fixed 
bound B on the number of squares (and their symbols) that the human can observe 
at one moment. If he wishes to observe more, he must use successive observations. 
(v) Due to his limited memory capacity, the human can only use finitely many 
“states of mind” to carry out the “computation”; otherwise, some of the “states of 
mind” would become indistinguishable.** (vi) Turing recognized that the “pro- 
cesses” that the human carries out in the “computation” can be split up into simple, 
indivisible elementary “operations”. (vii) Generally speaking, each “simple op- 
eration” changes the state of the system consisting of the human and paper tape, 
where the state of the system consists of (a) the sequence of all symbols written on 
the paper tape; (b) the symbols in the currently observed sequence of squares of the 
paper tape; and (c) the current “state of mind” of the human. (viii) More precisely, 
each “simple operation” can (a) alter not more than one symbol, which, in addi- 
tion, must be currently observed and have been recognized by the human; (d) alter 
the currently observed sequence of squares, provided that the distance between any 
newly and any immediately previously observed square is less than a fixed bound L, 
and that the symbols in the newly observed squares are immediately recognizable 
by the human; and (c) alter the current “state of mind” of the human. 


Then Turing summed up the isolated “simple operations” of human “computation”: 


The most general single operation [of a human in computation] must therefore be taken to 
be one of the following: 

(A) A possible change of symbol together with a possible change of state of mind. 

(B) A possible change of observed squares, together with a possible change of state of mind. 
The operation actually performed is determined [...] by the state of mind of the [human] 
computer and the observed symbols. In particular, they determine the state of mind of the 
[human] computer after the operation is carried out. 


30 See Sect. 6.1.2, p. 118 and Sect. 6.1.3, p. 121. 
31 The “state of mind” of a human is an informal notion which we only comprehend intuitively. 


32 This is the weakest of Turing’s assumptions for lack of our understanding of human “states of 
mind”. 
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Reduction of “human” operations to machine operations. Now Turing returned 
to his machine to show how the machine can simulate any “simple operation” of a 
computing human and hence any human “computation” of a number. In his words: 


We may now construct a machine [Turing machine] to do the work of this [human] com- 
puter. To each state of mind of the [human] computer corresponds an “m-configuration” 
[state] of the machine. The machine scans B squares [cells] corresponding to the B squares 
observed by the [human] computer. In any move the machine can change a symbol on a 
scanned square [cell] or can change any one of the scanned squares [cells] to another square 
[cell] distant not more than L squares [cells] from one of the other scanned squares [cells]. 
The move which is done, and the succeeding configuration [pair (current state, scanned 
symbol)], are determined by the scanned symbol and the m-configuration [state]. 


It should now be evident that Turing defined his machine (i) according to the 
results of his analysis of human “computation” and (ii) according to his intention to 
simulate “computation’ by the machine. 


Summary of Turing’s ideas 


Turing’s construction of the Turing machine clearly shows that he conceived his 
machine with the intention of providing a finite and mechanical description of the 
“computations” performed by a human whose mental and physical capabilities are 
limited (and hence realistic), while the time and space available for human “com- 
putation” are not limited (thus allowing the “computation” carried out by one hu- 
man to be continued by another human). The method that Turing followed was to 
(i) analyze the informal idea of a “computation” of a number carried out by a hu- 
man, and remove, one after the other, successive layers of all the irrelevant details; 
(ii) distill the observable relevant “processes” that compose such “computations”; 
(iii) isolate the elementary, indivisible “simple operations” that make up the “pro- 
cesses”; (iv) identify the key properties of the “simple operations”; (v) define an 
abstract computing machine, the Turing machine; and, finally, (vi) demonstrate that 
the “simple operations” of a human can be simulated by the Turing machine. 

In his paper, Turing presented his work in a different order, which enabled him 
to describe the implications of his discovery before giving justifications of it. 


The Usual Turing’s Thesis (functions) 


What about “computable” functions? While Godel, Church, and others searched 
for a characterization of “computable” functions, Turing investigated “computable” 
numbers. Why did he limit his attention to numbers? As said, Turing learned of 
the research pursued at Princeton just as he was ready to send off his manuscript. 
Nevertheless, at the beginning of his paper he did explain his approach: 


Although the subject of this paper is ostensibly the computable numbers, it is almost equally 
easy to define and investigate computable functions of an integral variable or a real or com- 
putable variable, computable predicates, and so forth. The fundamental problems involved 
are, however, the same in each case, and I have chosen the computable numbers for explicit 
treatment as involving the least cumbrous technique. 
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For instance, we can say that the function g : N > {0,...,9} defined by g(n) =x, 
is Turing-computable iff the number x = 0.x1x2...€ (0,1) is Turing-computable. 


So how can we extend Turing’s discoveries to computation of function values? 
Turing justified that any humanly “computable” number is also Turing-machine 
computable. What if a number to be “computed” is the value of a function at a 
given argument? 


Let f : IR — R be an arbitrary function, x € R an arbitrary argument, and suppose 
that f(x) is defined, ie., f(x)|. Human “computation” of the number f(x) can be 
viewed as a finite sequence of numbers written on the paper tape such that (7) the last 
number of the sequence is f(x), and (ii) each number of the sequence is either the 
argument x or has been directly “computed” from some of the preceding numbers 
of the sequence by one of the “simple operations” which—according to Turing’s 
analysis—a human can carry out in his “computation” of a number. In the case that 
such a sequence exists, we informally say that f is “computable” in x. Then we 
informally say that the function f is “computable” iff it is “computable” in every x 
for which it is defined. In other words, a function is “computable” iff its value can 
be “computed” by a human whenever the value is defined.*? 


Now suppose that the function f : R — R is “computable,” that is, 
LCF. 


Let x be an arbitrary number for which f(x) is defined. Since f is “computable” in 
x, there is a sequence corresponding to the “computation” of the number f(x). But 
Turing has shown that his machine can simulate this “computation” by (a) simulating 
each “simple operation” of the human, and (5) writing down on its tape the interme- 
diate results (numbers) as notes assisting the machine’s memory. Hence, f(x) is a 
Turing-computable number; for this reason we say that f is Turing-computable in x. 
We have justified that if f is “computable” in x, it is also Turing-computable in x. 
Since we supposed that f is “computable” (i.e., “computable” in every x for which it 
is defined), it follows that f is Turing-computable in every x for which it is defined. 
Such a function f is said to be a Turing-computable function. If we denote by T the 
class of all Turing-computable functions, we conclude that 


fET. 


We have justified that f € F > f € T, for any function f : R > R. This means that 


just 


FCT, 


just 


where C denotes a justified (informally proved) inclusion. This was the so-called 
hard half of Turing’s Thesis (functions). 


33 Note the similarity between this definition of a “computation” and the definition of a deduction 
(see Sect. 2.1.1, p. 10). This will become important in Kripke’s argument (see Sect. 16.4.3, p. 340). 
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The easy half is to justify the converse: 


just 


eave a 


Here is a proof: A human can imitate the operations of the Turing machine as these 
do not violate any of the limitations of human memory or human sensory capability. 


In sum, we have justified that 


FET, 


where = denotes a justified (informally proved) equality. This is a concise form of 
the usual Turing’s Thesis. 


Turing’s thesis. A function is “computable” iff it is Turing-computable. 


Reactions to Turing’s Thesis 


Godel approved Turing’s analysis and Turing’s Thesis as satisfactorily convincing. 
In addition to that, Turing’s method partly complied with Gédel’s suggestions to 
Church in 1934 (see p. 320). 


Church endorsed Turing’s characterization of “computability” in his review in 1937: 


[C]omputability by a Turing machine [...] has the advantage of making the identification 
with effectiveness in the ordinary (not explicitly defined) sense evident immediately. 


Kleene wrote: 


For rendering the identification with effective calculability the most plausible—indeed, I 
believe compelling—Turing computability has the advantage of aiming directly at the goal. 


Post, whose paper from 1936 contains the same idea as Turing’s, accepted Turing’s 
Thesis and soon proposed some improvements to the definition of a Turing machine. 


Turing’s work also convinced Gédel of Church’s Thesis. Here is what Kleene wrote: 


According to a November 29, 1935, letter from Church to me, Gédel “regarded as thor- 
oughly unsatisfactory” Church’s proposal to use A-definability as a definition of effective 
calculability. [...] It seems that only after Turing’s formulation appeared did Gédel accept 
Church’s thesis. 


But there were more reasons to endorse Turing’s work. Namely, Tarski and Gédel 
considered Turing computability and the equivalent (general) recursiveness of great 
importance for metamathematical reasons, because the two proposed concepts ab- 
solutely defined the epistemological notion of computability. Why? 
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In 1936, Godel already proved that, given a sequence of increasingly strong for- 
mal axiomatic systems (f.a.s.) F,, F2, ..., passing from F; to a stronger F;,; may 
enable us to prove certain propositions that were not provable in Fj. So Gédel 
defined the concept of a function (of positive integers) computable in an f.a.s. 
and focused on the provability of propositions about such functions in different 
f.a.s. He discovered that once we have a sufficiently strong f.a.s. F,—which is in 
fact Formal Arithmetic*+ A—there is no need to consider stronger f.a.s. Fy41,.-., 
because in F;, (=A) can be proved anything about such functions that could be 
proved in a stronger f.a.s. 

In the particular case of the provability of function computability this means that 
a function of positive integers is computable in any f.a.s. F containing A iff it is 
computable in A. It also means that the notion of computability of functions of pos- 
itive integers is epistemologically invariant (stable) in all f.a.s. that contain A. In 
this sense, the definition of function computability in terms of Turing computability 
or (general) recursiveness is absolute (i.e., stable from F;, = A on). In 1946, Godel 
briefly and clearly described this fortunate situation:* 


Tarski has stressed in his lecture (and I think justly) the great importance of the concept 
of general recursiveness (or Turing’s computability). It seems to me that this importance is 
largely due to the fact that with this concept one has for the first time succeeded in giving 
an absolute definition of an interesting epistemological notion, i.e., one not depending on 
the formalism chosen. In all other cases treated previously, such as demonstrability or de- 
finability, one has been able to define them only relative to a given language, and for each 
individual language it is clear that the one thus obtained is not the one looked for. For the 
concept of computability however, although it is merely a special kind of demonstrability or 
decidability the situation is different. By a kind of miracle it is not necessary to distinguish 
orders, and the diagonal procedure does not lead outside the defined notion. 


In 1951, Gddel summarized the results important to mathematics and philosophy of 
the last few decades and emphasized Turing’s role in this: 


Research in the foundations of mathematics during the past few decades has produced some 
results of interest, not only in themselves, but also in regard to their implications for the 
traditional philosophical problems about the nature of mathematics. [...] The greatest im- 
provement was made possible through the precise definition of the concept of the finite 
procedure [“equivalent to the concept of a ‘computable function of integers”’], which plays 
a decisive role in these results. There are several different ways of arriving at such a defi- 
nition, which, however, all lead to exactly the same concept. The most satisfactory way, in 
my opinion, is that of reducing the concept of a finite procedure to that of a machine with a 
finite number of parts, as has been done by the British mathematician Turing. 


34 See Sect. 3.2, p.45. 


35 At the time it was already believed that the concept of “computability” is formalism-independent, 
in the sense that all formal characterizations of the concept pick out the same class of functions. 
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16.4 Church-Turing Thesis 


We have seen that, in 1936, the quest for the formalization of informal notions 
of computing brought to the surface two theses which concerned computability of 
functions, Church’s Thesis and Turing’s Thesis. Some researchers supported the first 
thesis and others approved the second one. Naturally, they started comparing the two 
theses and eventually spotted several differences, which we sum up in the following. 


16.4.1 Differences Between Church’s and Turing’s Theses 


The main differences between Church’s Thesis and Turing’s Thesis are: 


1. Church simply and boldly postulated (defined) that 
Freeec), 


with the only indication for this being the rigorously proved confluence G*= C*, 


while Turing justified that F C T and, along with the easy part F > T, that 
FST 


2. Church provided in support of his thesis no reference to any means that could me- 
chanically carry out computation of (general) recursive or A-definable functions, 
while Turing established and justified his thesis by introducing an abstract com- 
puting machine and linking it to the intuitive idea of “computability”. The ability 
of this machine, the Turing machine, to encapsulate the fundamental logical prin- 
ciples of computation had a profound significance for the emerging science of au- 
tomatic computation and the realization of the stored-program general-purpose 
digital computer. 


3. Church focused on functions of positive integers, while Turing considered func- 
tions of positive integers or Turing-computable real numbers. In fact, Turing’s 
concerns were even more general than Church’s: they also included computable 
predicates and functions of real variables. 


4. While the intention of both Church and Turing was to refer to functions “com- 
putable” by an idealized human, Turing’s additional intention was to provide an 
idealized description of numerical “computation” as a particular human activity. 
Because of this difference between their intentions, we say that Church’s and 
Turing’s Theses are not intensionally equivalent. 
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16.4.2 The Church-Turing Thesis 


In 1952, Kleene observed that by restricting attention to functions of positive inte- 
gers Turing’s Thesis reduces to 


just ae 
FreH FT", 


where F~* and T~ are the class of “computable” functions of positive integers and 
the class of Turing-computable functions of positive integers, respectively. Since 
Turing outlined a proof of T* = C* (in the Appendix to his paper of 1936), and 
gave a full proof of Tt =Ct and T* =G* (in his paper of 1937), the following 
was known about the classes of functions of positive integers: 


Church’s Thesis: Breage aer 
\ od 
Turing’s Thesis (for positive integers): Ft = T+ 


So, after restricting to functions of positive integers, both theses characterized the 
same class, F*, in spite of the fact that their original intentions were distinct. 
This is why we say that Church’s and Turing’s Theses are extensionally equivalent. 


Now note that we can combine the two theses into one. Start with Turing’s The- 
sis (for functions of positive integers) and equate its right-hand side (J *) with the 
right-hand side of Church’s Thesis, bypassing Church’s postulated definition (:=). 
The resulting thesis is today called the Church-Turing Thesis (see below the line): 


Church’s Thesis: Ft = gt(=ct) 
\ J 
Turing’s Thesis (for positive integers): Ft = Tt 


Church-Turing Thesis: Fr T+ =Gr=ct 
The new thesis is often shortened to 
FET (=C*), 
but still called the Church-Turing Thesis. 


Church-Turing Thesis. A function of positive integers is “computable” iff 
it is Turing-computable (or A -definable). 
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16.4.3 Justifications of the Church-Turing Thesis 


Since the inception of the Church-Turing Thesis, several types of arguments have 
been given to justify it. In the following we classify them according to Kleene and 
Shoenfield:*° 


1. The argument from non-refutation. 

In spite of sustained and ongoing attempts to find a counter example (e.g., a 
“computable” function that is not Turing-computable), the Church-Turing The- 
sis has never been refuted. All the “computable” functions that researchers have 
constructed have been shown to be Turing-computable. Moreover, all the known 
techniques for constructing new “computable” functions from old ones have been 
shown to lead from Turing-computable functions to Turing-computable func- 
tions. Thus, there is no idea how to produce a “computable” but not Turing- 
computable function. 


2. The argument from replacement. 
As Shoenfield pointed out, we become convinced of the Church-Turing Thesis 
after detailed study of Computability Theory, because all the results of the theory 
become quite reasonable or even obvious when the term Turing-computable is 
replaced by the term “computable”. 


3. The argument from confluence. 

The first proposals for the characterization of function “computability”, that is, 
A-definability, Turing computability, -recursiveness, and Post’s finite combina- 
tory processes, quickly proved to be extensionally equivalent, i.e., they defined 
the same subclass Tt = Gt = C+ = K+ = P* of the class F* of all “com- 
putable” functions of positive integers. Subsequently, many other—in terms of 
the approach and formal details—quite different proposals for the characteri- 
zation of Ft were given, such as Markov algorithms,*” RAM,”® and cellular 
automata. But again, each of them turned out to encompass the very same sub- 
class T*(= Gt = Ct = Kt = P*) of F™. So the class T* turned out to 
be significantly formalism-independent, in the sense that all known, superficially 
diverse formalisms pick out exactly this subclass of Ft. This striking conflu- 
ence of many fundamentally different proposals for the characterization of FT 
suggests that the distinguished class Tt = Gt = CT = Kt = PT=.--- isa 
very natural class. It also gives us grounds for the belief that the confluence of 
ideas will continue for the characterizations of F* that may be proposed in fu- 
ture. All of this strengthens our confidence that T* actually characterizes the 
informally defined target class F~. 


36 Joseph Robert Shoenfield, 1927-2000, American mathematical logician. 
37 See Sect. 5.2.3, p.93. 
38 See Sect. 6.2.7, p. 132. 
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4. The argument from Turing’s analysis (Argument I). 

Turing’s analysis (which we presented in Sect. 16.3.6) reveals five general and 
intuitive constraints satisfied by a “computing” human: (i) Human behavior is 
at any moment determined by his current “state of mind” and the currently 
observed symbols. (ii) There is a bound on the number of squares which a human 
can observe at one moment. (iii) There is a bound on the distance between 
the newly observed squares and the immediately previously observed squares. 
(iv) Ina simple operation, a human alters at most one symbol. (v) Only finitely 
many human “states of mind” need to be taken into account. Then Turing’s 
analysis demonstrates that if a human “computation” is subject to these five rea- 
sonable and natural constraints, then it can be simulated by a Turing machine, 
a finite machine that operates in a purely mechanical manner. Under these con- 
straints, any humanly “computable” function is also Turing-computable. This 
argument is often called Argument I. 


5. The argument from first-order logic (Argument II). 

In addition to Argument I, Turing gave an argument which he titled “A proof of 
the equivalence of two definitions” and which today is often called Argument II. 
This argument has mostly been ignored in discussions of Turing’s paper, perhaps 
because its presentation is somewhat demanding, while Argument I is accessi- 
ble and is well accepted by most of the research community. In Argument II, 
Turing stated that he had constructed a Turing machine that can generate all 
the provable formulas of the First-Order Logic L (he used the term Hilbert 
restricted functional calculus). Conversely, after showing that some sequences of 
symbols can be defined by formulas of L, he proved that sequences definable in 
L are Turing-computable. The latter statement is today often called the Turing’s 
Provability Theorem. 


Theorem 16.2. (Turing’s Provability Theorem) Every formula provable in the 
First-Order Logic L can be proved by the universal Turing machine. 


As said, Turing expressed a belief that his thesis is not susceptible to rigorous 
proof because it is not a mathematically precise statement (see p. 330). Accord- 
ingly, his Argument I is an informal justification of the thesis. Turing’s intentions 
with Argument II were presumably similar, so he may have presented the above 
theorem as part of an intended justification. This is the basis of the next argument. 


6. Kripke’s Logical Orientation. 
In 2013, Kripke*® revived Argument II. He took Turing’s Provability Theorem 
as the main point of Argument II, and building on this, he advocated a logi- 
cal orientation to the Church-Turing Thesis. In particular, Kripke proposed that 
derivability from a finite set of instructions expressible in a first-order language 
be accepted as a basic concept of computability. He explained the rationale be- 
hind the proposition as follows. Kripke assumed that the underlying motivation 


38 See Appendix A, p. 364. 
39 Saul Kripke, b. 1949, American philosopher and logician. 
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of Argument II was Turing’s belief that his characterization of “computability” 
in terms of his machines is equivalent to the characterization of “computability” 
in terms of deducibility in a standard formal axiomatic system, e.g., First-Order 
Logic with equality,” L. In Kripke’s opinion, therefore, Turing’s intention was 
to argue that his Provability Theorem could serve in proving this equivalence. 


Building on this, Kripke presented the following reconstruction of Argument IT 
in which Turing’s Provability Theorem actually plays an important role: 


a) Premise: “Computation” is a special form of “deduction”. 


This is Kripke’s Claim. In contrast to the Church-Turing Thesis, the claim 
relates two informal notions. We do not expect that the claim can be rig- 
orously proved, but a justification for it can be given as follows. Suppose 
that one is given a list (i.e., recipe or “effective procedure”’) of finitely many 
instructions, and perhaps some well-known and not explicitly stated math- 
ematical premises, which can be used during “computation”, such as the 
Peano axioms or premises about the values of some input and/or “computed” 
data. The “computation” is the “execution” of the given list of instructions. 
Every step in the “computation” is supposed to follow deductively from the 
“execution” of the current instruction and from the currently relevant im- 
plicit premises and/or explicit premises (i.e., facts) that have been established 
during the “computation”. The “computation” can therefore be viewed as a 
very specialized “deduction,” i.e., deductive “argument,” whose conclusion is 
a sentence precisely describing the result of the “computation”.*! It is this 
sense that Kripke advocated a logical orientation to the characterization of the 
notion of “computation”. The following example illustrates Kripke’s claim. 


Example 16.1. (“Computation is a special form of “deduction”.) First a short reminder. We 
say that a sentence p logically entails a sentence q (written p F q) iff every truth assignment 
that satisfies p also satisfies g. Generally, a set P of sentences logically entails a sentence g 
(i.e., P F q) iff every truth assignment that satisfies all of the sentences in P also satisfies q. 
Now, the “computation” describing the usual “execution” of the following simple list 


Ty: at 3 // instruction I, stores 3 ina 
Ig: at+-a+4 _ //instruction Iz adds 4 toa 


can be expressed as 
I1;Ig 


which means that the execution of I, is immediately followed by the execution of Ig. 
The “deduction” corresponding to the “computation” I1;I2 can be expressed as 


1. Iy: a3 // execution of I, with no premises 

2. a=3 // follows from (logically entailed by) 1. 
3. Ig: at+-a+4 _ //execution of Iz with explicit premise 2. 
4, a=7 // follows from (logically entailed by) 3. 


Thus a = 7 is the conclusion of the “deduction” and 7 is the result of the “computation”. 


40 See Sect. 3.2, p. 44. 


41 Tn his 1936 paper, Turing developed a detailed logical notation for expressing such “arguments”. 
See the footnote about the similarity between definitions of “computation” and deduction on p. 334. 
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b) Premise: Every “deduction” can be expressed as a valid deduction in L. 


This statement Kripke called Hilbert’s Thesis. It claims that any “deduction” 
(i.e., any finite sequence of content-dependent inferences where each conclu- 
sion undoubtedly follows from the meaning of its premises) can be restated 
as a valid deduction (preserving truth from premises to the final conclusion) 
in a language based on First-Order Logic with equality, L. Since the the- 
sis identifies the intuitive notion of a deduction” with the precise notion of a 
valid derivation in a standard formal axiomatic system, it is believed, in accor- 
dance with the prevailing opinion, to be justifiable only by appeal to intuition 
(but see a critique of this opinion in Sect. 16.4.4.). 


c) Every “computation” can be expressed as a valid deduction in L. 
This follows directly from a) and b). 
d) Every valid deduction in L is provable in L. 


This follows directly from Gédel’s Completeness Theorem*, which states that 
Lis semantically complete (a formula has a valid deduction iff it is provable). 


e) Every “computation” is a provable deduction in L. 


This follows directly from c) and d). In other words, every human “computa- 
tion” can be proved in L, in the sense that each step of the formalized com- 
putation in L can be derived from the current instruction and possibly some 
well-defined premises. 


J) Every provable deduction in L is provable by the universal Turing machine. 
This is Turing’s Provability Theorem (see above). 

g) Conclusion: Every “computation” can be carried out by Turing machine. 
This follows from e) and f). So, every “computation” can be carried out by 


the universal Turing machine. 


In summary, under the two premises (Kripke’s and Hilbert’s), the Church-Turing 
Thesis follows from Gédel’s Completeness Theorem and Turing’s Provability 
Theorem. 


askah Kripke bare 
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4? See Sect. 4.1.1, p. 58. 
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Thus, Kripke demonstrated that the Church-Turing Thesis is deducible from his 
own claim and Hilbert’s Thesis. Since the latter is not susceptible to rigorous 
proof, Kripke’s argument doesn’t prove the Church-Turing Thesis, but reduces it 
to another thesis that also resists rigorous proof. 


Nevertheless, this reduction allows one to adopt as a basic concept of com- 
putability either the machine-oriented Church-Turing thesis or the perhaps more 
intuitively appealing logically oriented Hilbert’s thesis (assuming that Kripke’s 
claim holds). This may have various advantages. One of them Kripke described 
as follows: 


Another relatively minor advantage of the approach of deducibility in a first-order lan- 
guage [...] is simply pedagogical. Ever since Post’s famous paper (1944) [see Post 
[184]], the advantage of an intuitive presentation of computability arguments has been 
evident, rather than the rival approach using a formal definition of computability in the 
proofs. Although the experienced computability theorist will know how to convert such 
arguments into proofs using a formal definition, and cannot be said to be relying on an 
unproved thesis, this is hardly true for a beginner. However, such a beginner, if he has 
already studied elementary logic, will readily accept that the steps of an argument can 
be stated in a first-order language, even if they are given verbally. The Gédel complete- 
ness theorem guarantees that if the steps really do follow (using any implicit axioms 
in addition to the actually stated steps), the argument can be formalized in one of the 
usual systems. Granted that the proof predicate [the predicate stating that a formula is 
provable from another formula] of such a system is recursive/computable by one of the 
usual definitions, that one really has a technically valid proof will be readily accepted. 


16.4.4 Provability of the Church-Turing Thesis 


Until recently, most logicians, mathematicians, and computer scientists shared the 
opinion that the Church-Turing Thesis is not amenable to rigorous proof. However, 
in 1990, Mendelson** gave a powerful critique of this general opinion. His critique 
consists of the following three points: 


1. Mathematics and logic already tolerate vagueness. The concepts and assump- 
tions supporting the notion of Turing-computable function are essentially no less 
vague and imprecise than the notion of “computable” function. 


For example, the concept of a function, which is used in the definition of a 
Turing-computable function, is rigorously defined using the notion of a set: 
a function f : A—> Bisaset f of ordered pairs (a,b), where ac A, b = f(a) EB, 
and where (a,b) € f and (a,c) € f implies b = c. The notion of the ordered 
pair (x,y) too is rigorously defined by the notion of the set: (x,y)={{x}, {x,y}}. 
In contrast, the notion of a set, being a basic notion, is not rigorously defined.*+ 


43 Eliott Mendelson, b. 1931, American logician. 
44 See Sects. 2.1.2, 2.1.3, and 3.2, p.47. 
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Consequently, the concept of a function ultimately rests on an intuitively under- 
stood and imprecisely defined notion of a set! But there is more to it than that. In 
contrast with Turing-computable and “computable” functions, which are linked 
by the Church-Turing Thesis, there is no thesis in mathematics claiming that the 
above (seemingly) rigorous definition of a function f : A — B satisfactorily char- 
acterizes the intuitive concept of a “function” (which understands f to be a “rule” 
for “calculating” or “constructively assigning” b = f(a) € B to any a € A). The 
concept of a function is so familiar and so well-tested in logic and mathematics 
that such a thesis was never an issue. Mendelson pointed out that there are many 
other intuitive notions (such as “a sentence being true in a structure”, “limit”, 
“measure”, “dimension’’) whose rigorous definitions ultimately rest on the intu- 
itive notion of a set, and the adequacy of these (seemingly) rigorous definitions 
is never questioned. 


2. The intuitive and rigorous can be linked. The general assumption that a proof 
linking vague and precise mathematical notions is impossible is false. 


For example, the easy half of the Church-Turing Thesis, stating that all Turing- 
computable functions are “computable”, is widely acknowledged to be obvious 
and is therefore readily accepted. This is because there is a straightforward argu- 
ment for it, which, due to its simplicity, is acknowledged to be a proof, in spite 
of the fact that it involves the intuitive notion of “computability”.+ So, the fact 
that the argument is not in ZF or some other formal axiomatic system is no draw- 
back (but, as Mendelson stressed, shows that there is more to mathematics than 
appears in ZF). 


3. Proving is not the only way to ascertain the truth. The usual opinion that proof 
is the only way to ascertain the truth of the Church-Turing Thesis is false. 


In mathematics and logic, proof is not always the only way in which a statement 
comes to be accepted as true. Equivalences between intuitive notions and appar- 
ently more precise mathematical notions often are simply “seen” to be true with- 
out proof, or are based on arguments that are a mixture of such (non-empirical) 
intuitive perceptions and standard logical and mathematical reasoning. Some 
such intuitive notions were mentioned above in the first item. 


In summary, today it appears that the opinion that the Church-Turing Thesis is 
provable is gaining acceptance. Although several proofs of the Church-Turing The- 
sis have recently been proposed, the matter remains the subject of active discussion 
within the academic community. 


45 We describe the argument for 1-recursive functions. The initial functions are “computable” 
(the “procedures” to “compute” their values are trivial). Composition, primitive recursion, and 
[1-operation produce “computable” functions from “computable” functions (in each case we can 
easily describe “procedures” that will “compute” the new “computable” function). 
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16.5 Résumé and Warnings 


Besides the original Church-Turing Thesis, the reader may find in various contexts 
similar-looking statements that are, for different reasons, mislabeled as the Church- 
Turing Thesis. Below we will address some of these look-alikes. 

Moreover, due to the fast development of general-purpose computers and the re- 
cent rise of different proposals for a new computing paradigm, several scientists 
have questioned the adequacy of the Church-Turing Thesis for the suggested para- 
digms, whether realistic (such as parallel and potentially quantum computation) 
or speculative ones (such as hypercomputation). Consequently, new versions of the 
Church-Turing Thesis have emerged, some of them only recently, each involv- 
ing or emphasizing a certain notion implicit in the original Church-Turing Thesis. 
These notions include implementability of algorithms, computational complexity, 
and physical computability. Not surprisingly, questions about the original Church- 
Turing Thesis and the proposed versions have arisen. Thus, the necessity, adequacy, 
and relative power of each of them are currently actively discussed within the aca- 
demic community. 

To keep up with this discussion, understanding the subtle differences between the 
original Church-Turing Thesis on the one side and its look-alikes or the emerging 
new versions on the other side is important for anyone interested in the foundations 
of Computability Theory. 


Résumé 


In 2017, Copeland*® extracted the essence of Turing’s work in the following passage: 


Turing proved that his universal [Turing] machine can compute any function that 
any Turing machine can compute; and he put forward, and advanced philosophical 
arguments in support of, the thesis that effective methods [procedures] are to be 
identified with methods [procedures] that the universal Turing machine is able to 
carry out. 


Copeland then summed up: 


Essentially, the Church-Turing Thesis says that no human computer, or machine 
that mimics a human computer, can out-compute the universal Turing machine. 


46 B. Jack Copeland, b. 1950, English philosopher and historian of computing, and mathematical 
and philosophical logic. 
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Some Warnings 


In relation to this, here are some potentially misleading claims that have appeared in 
various sources. The claims are only conditionally true because they go far beyond 
Turing’s discoveries: 


e Turing showed that the universal Turing machine can specify the steps required for the solution 


of any problem that can be solved by instructions, explicitly stated rules, or procedures. 


99 66 


The claim is true only if the general terms “instruction”, “explicitly stated rule’, 
and “procedure” are restricted so that they refer only to what can be done by 
“effective procedures”’. 


e Turing had proven that his universal Turing machine can compute any function that any computer, 


with any architecture, can compute. 


The claim is true only if the term “computer” refers to the Turing machine or 
a machine equivalent to it. Turing proved that the universal Turing machine is 
universal with respect to other Turing machines (not to any machine capable of 
computing). 


e Every task for which there is a clear recipe composed of simple steps can be performed by a 


very simple computer, the universal Turing machine. 


The claim is true only if the term “clear recipe composed of simple steps” refers 
to an “effective procedure”. 


e Turing’s results entail that a standard digital computer can compute any rule-governed input- 


output function. 


The claim is true only if the term “rule-governed input-output function” refers to 
a Turing-computable function. 


e A function is computable by means of a machine (i.e., mechanically computable), iff it is 


Turing-computable. 


The Church-Turing Thesis does not address computability of functions by means 
of machines, so it does not imply the above claim. Conversely, if we accept that 
a human is a type of machine, then the claim implies the Church-Turing Thesis. 
Thus, the claim and the Church-Turing Thesis are not equivalent. 
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16.6 New Questions About the Church-Turing Thesis 


In 2019, Copeland and Shagrir*’ addressed the open questions that concern the 
fundamental place of computing in the physical universe. After giving an overview 
of the original Church-Turing Thesis and its equivalent versions, they described and 
discussed today’s more powerful versions of this thesis. In the rest of this section, 
we follow their exposition (and abbreviate the Church-Turing Thesis to CTT). 


16.6.1 Original CTT 


First, of course, there is the original Church-Turing Thesis: 


(CTT) Any function that is intuitively computable is computable by some Turing machine. 


Copeland and Shagrir put it as follows: 


(Copeland-Shagrir (CTT-O)) Every function that can be computed by the idealized 
human computer (i.e., can be effectively computed), is Turing-computable. 


Now we describe algorithmic, complexity-theoretic, and physical versions of CTT. 


16.6.2 Algorithmic Versions of CTT 


These versions of the original Church-Turing Thesis involve the intuitive notion of 
“algorithm”*® by asking what can be computed by “algorithms” in general. Here 


is an algorithmic version of CTT stated in 1981 by Lewis*? and Papadimitriou”: 


(Lewis-Papadimitriou) [WJe take the Turing machine to be a precise formal equivalent of 
the intuitive notion of “algorithm”: nothing will be considered an algorithm if it cannot be 
rendered as a Turing machine. 


But computing machines have been developing; for example, they are now able— 
in contrast to Turing machines—to change several dislocated parts of data at each 
step of computation. Accordingly, the concept of algorithm has been evolving, 
so much so that certain steps of some kinds of algorithms cannot be directly carried 
out by the basic steps of Turing machines. For such an algorithm, a Turing machine 
with an essentially different Turing program must be designed if one wants to show 
that the problem under consideration is still Turing computable (and not just com- 
putable by the particular kind of algorithm). But how much can algorithms evolve 
and still remain reasonable relative to the Turing machine? That is, what is the up- 
per bound on the concept of an algorithm so that algorithmic computability entails 
Turing computability? Again we are faced with the question What is an algorithm? 


47 Oron Shagrir, b. 1961, Israeli philosopher of computing and cognitive/brain sciences. 
48 We use terms “algorithm” and “effective procedure” interchangeably. 

4° Harry Roy Lewis, b. 1947, American computer scientist and mathematician. 

50 Christos Harilaos Papadimitriou, b. 1949, Greek theoretical computer scientist. 
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Besides various views of what kind of abstract entities are algorithms, it is also 
debated to what extent algorithms should be implementable, that is, reducible to 
restricted forms that are based on the familiar models of computation (e.g., Turing 
machine and RAM). Since implementable algorithms are expressible in terms of fa- 
miliar models of computation, do non-implementable algorithms make any sense? 
Theoretically, they do. To provide a development of the theory of algorithms within 
axiomatic set theory, Moschovakis*! adopted an abstract notion of algorithm (with 
recursion as a primitive operation) that is so wide as to admit even non-implementable 
algorithms. In contrast, focusing on the practical aspects of algorithms, Harel>? ad- 
vocates that an algorithm only has to be expressible in a “reasonable” programming 
language. Thus, Harel suggested the following algorithmic version of CTT: 

(Harel) [AJny algorithmic problem for which we can find an algorithm that can be 

programmed in some programming language, any language, running on some computer, 

any computer, even one that has not been built yet but can be built, and even one that 


will require unbounded amounts of time and memory space for even-larger inputs, is also 
solvable by a Turing machine. 


Building on this and setting aside the issue of algorithm implementability, Copeland 
and Shagrir formulated in 2019 the following algorithmic version of CTT: 


(Copeland-Shagrir (CTT-A)) Every algorithm can be expressed by means of a program 
in some (not necessarily currently existing) Turing-equivalent programming language. 


16.6.3 Complexity-Theoretic Versions of CTT 


These are versions of the original Church-Turing Thesis involving issues related to 
Computational Complexity Theory, such as the relation between time complexities 
of computational problems in different (reasonable general) models of computation. 
That such models are connected in a particularly appealing way is stated by the 
Cobham*?-Edmonds™ thesis, which has become part of computer science folklore. 
(Cobham-Edmonds) /f a computational problem’s time complexity is t in some (general 


and reasonable) model, then its time complexity is assumed to be poly(t) in the single-tape 
Turing machine model. 


There are of course different views about which models count as reasonable, one 
being that a model is reasonable iff it is physically realizable (at least in principle). 
But why is this thesis important? Take an arbitrary NP-complete computational 
problem. Assuming the Cobham-Edmonds thesis, it follows that if the problem is 
not solvable in polynomial time on a single-tape Turing machine, then neither is it 
solvable in polynomial time in any other reasonable general model of computation. 
This means that the notion of polynomial (un)solvability of computational problems 
is robust, insensitive to the model of computation. And so is the question P = NP. 


5! Yiannis N. Moschovakis, b. 1938, Greek-American logician and theoretical computer theorist. 
52 David Harel, b. 1950, Israeli computer scientist. 

53 Alan Belmont Cobham, 1927-2011, American mathematician and theoretical computer scientist. 
54 Jack R. Edmonds, b. 1934, American computer scientist. 
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Some forms of the Cobham-Edmonds thesis use a probabilistic Turing Machine» 
instead of a single-tape Turing machine. For example, Bernstein*® and Vazirani>’ 
pointed out that Computational Complexity Theory rests upon the following thesis: 


(Bernstein-Vazirani) Any reasonable model of computation can be efficiently simulated 
on a probabilistic Turing machine. 


In 2013, Aharonov>® and Vazirani reformulated the Bernstein-Vazirani thesis and 
stated the following Extended Church-Turing Thesis (CTT-E): 


(Aharonov-Vazirani (CTT-E)) [AJny reasonable computational model can be simulated 
efficiently by the standard model of classical computation, i.e., a probabilistic Turing machine. 


Here, “classical” refers to any computation that is not quantum computation. This is 
because CTT-E is also used in quantum-computation research. And it is this research 
that brought about a potential counterexample to CTT-E. Namely, the fastest known 
classical algorithm solves the PRIME FACTORIZATION»? problem on any classical 
computer in exponential time, while a quantum computer can solve it in polynomial 
time by Shor’s quantum algorithm. Does this falsify CTT-E? Not quite, as some still 
hold that a quantum computer is not a physically reasonable model of computation. 


16.6.4 Physical Versions of CTT 


These versions of the original Church-Turing Thesis involve physical reality by 
asking What can be computed by physical systems in general? For example, in 1985, 
Wolfram” suggested the following physical version of CTT: 


(Wolfram) [UJniversal computers are as powerful in their computational capacities as 
any physically realizable system can be, so that they can simulate any physical system. 


Also in 1985, and independently of Wolfram, Deutsch proposed a similar thesis, 
which is now named the Church-Turing-Deutsch- Wolfram Thesis (CTDW): 


(Church-Turing-Deutsch- Wolfram (CTDW)) Every finite physical system can be simulated 
to any specified degree of accuracy by a universal Turing machine. 


Here, “simulation” is not a perfect simulation in the sense of achieving absolute 
accuracy; otherwise, CDTW would be falsified by classical continuous physical 
systems since these involve incomputable real numbers, proved Deutsch. 


5° A probabilistic TM picks one from a set of alternative transitions according to some probability 
distribution. 


56 Ethan Bernstein was a student of U. Vazirani. 
57 Umesh Virkumar Vazirani, Indian-American computer scientist. 
58 Dorit Aharonoy, b.1970, Israeli computer scientist. 


59 PRIME FACTORIZATION is the problem of decomposing a composite number into a product of 
prime numbers. Currently, no classical algorithm is known that can solve the problem in poly- 
nomial time. Neither the existence nor non-existence of such an algorithm has been proved (it is 
believed that it does not exist and that the problem is not in class P). The problem is in class NP. 
But it has not been proved to be or not be NP-complete (it is believed not to be NP-complete). 


60 Stephen Wolfram, b. 1959, British-American computer scientist, physicist, and businessman. 
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Since simulation of a system is just computation of a particular aspect of the 
system’s behavior, namely the system’s time evolution from an initial state to a final 
state, Copeland and Shagrir suggested a much stronger version of the above thesis, 
calling it the Total Physical Computability Thesis (CTT-P): 


(Copeland-Shagrir (CTT-P)) Every physical aspect of the behavior of any physical system 
can be calculated (to any specified degree of accuracy) by a universal Turing machine. 


Here, “physical system” means any actual, non-actual, or idealized system whose 
behavior is in accordance with the actual laws of physics. 


Is CTT-P true? Copeland and Shagrir listed several potential counterexamples to 
CTT-P that have been presented by researchers. Here we describe two of them that 
have recently emerged from quantum mechanics: 


e In 2012, Eisert, Miiller, and Gogolin proved that OUTCOME SEQUENCES®!, a 
decision problem about the results of repeated quantum measurements, is unde- 
cidable. Remarkably, if we require that the measurements are classical instead of 
quantum, the problem becomes decidable. 


e In 2015, Cubitt, Perez-Garcia, and Wolf proved that the SPECTRAL Gap 
problem is undecidable (the proof is by reduction of the problem HALTING 
OF TURING MACHINES®?). So there cannot exist an algorithm or a computable 
criterion that solves the SPECTRAL GAP problem in general. But SPECTRAL GAP 
is one of the most important physical properties of quantum many-body systems 
and, consequently, an important determinant of a material’s properties. Thus, a 
major physics problem is undecidable. 

How is this result connected to CTT-P? Cubitt et al. proved undecidability of 
the SPECTRAL GAP problem for an infinite system of two-dimensional lattices 
of atoms. But the proof also applies to finite systems whose size increases. Such 
systems are physically relevant and, as the proof revealed, there exists no effec- 
tive method for computing their future behavior from complete descriptions of 
their current and past states. So these finite systems of increasing size offer a 
counterexample to CTT-P. 

However, there is the open question of whether the SPECTRAL GAP problem be- 
comes decidable if all involved mathematical structures (e.g., Hilbert spaces) are 
bound to realistically low dimensions as imposed by the actual physical world. 


Although also other theoretical counterexamples in which CTT-P is false have been 
found, there is currently still no evidence that CTT-P is false in the actual universe. 
But, of course, this does not imply that it is true. 


61 OUTCOME SEQUENCES is to determine whether or not a given sequence can occur as outcome 
sequence in repeated quantum measurements. 

© A spectral gap is the energy difference between the ground state and first excited state of a 
quantum many-body system. Such a system is gapless if it has a continuous spectrum above the 
ground state in the thermodynamic limit, and is gapped if it has a unique ground state and a constant 
lower bound on the spectral gap. The SPECTRAL GAP problem is to determine, given the matrices 
describing the local interactions of a many-body system, whether the system is gapless or gapped. 


63 See Sect. 8.3.1, p 187. 
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In addition to the above mentioned, there are several other physical versions of 
CTT in the literature. By extracting their shared essences, Piccinini®™ classified them 
into two types of theses, calling them the bold and the modest physical CTT: 


(Piccinini (BoldCTT-P)) Any physical process—anything doable by a physical system—is 
computable by a Turing machine. 


(Piccinini (ModestCTT-P)) Any function that is computable by a physical system is 
computable by a Turing machine. 


So, anything a physical system can do can be simulated by a TM (BoldCTT-P); and 
anything a physical system can compute can be computed by a TM (ModestCTT-P). 


But what is the difference between the two theses? Isn’t physical computation in a 
physical system something the system actually does? True, but the converse is not: 
In BoldCTT-P, a physical process may not be physical computation in the sense of 
yielding the value of some dynamic physical variable, but it can be simply the time 
evolution of the physical system from its initial state at a given instant to a final state 
at some later instant. Moreover, even if a physical process is physical computation 
of the value of some dynamic physical variable, the process may be unusable in the 
sense that the resulting value cannot be obtained by an observer whose behavior is 
supposed to be appropriately influenced by the result of the physical process. Specif- 
ically, the observer, who can be a human being or a functionally organized system, 
may want to use the physical process in the physical system to obtain the value of a 
given function whose argument has been encoded in the system’s initial state. 


Thus, BoldCTT-P may not be about what an observer can discover about the values 
of a given function. Because of this, Piccinini argued that BoldCTT-P is irrelevant 
to the epistemological concerns motivating the original Church-Turing Thesis. 


Consequently, Piccinini stated his Usability Constraint, which distinguishes be- 
tween physical processes and physical computations as follows: 


A physical process should not count as a computation unless 
a finite® observer can use it to generate the desired values of a given function. 


Then Piccinini proceeded to his ModerateCTT-P, grounding it on a notion of phys- 
ical computation that now complies with the Usability Constraint. How did he 
do that? We have seen that the Usability Constraint filters out all physical pro- 
cesses that are irrelevant because their results cannot be obtained and used by 
finite observers. The remaining physical processes are in this sense usable, but 
it is still unclear whether or not these processes must pass any additional con- 
straints in order to qualify as genuine physical computations. Facing the question of 
what a genuine physical computation is, Piccinini put forward the following open- 
ended list of constraints and their sub-constraints imposed on any physical process 
P and any physical system S if P is to be counted as a physical computation on S: 


64 Gualtiero Piccinini, b. 1970, Italian-American philosopher. 
65 A finite observer is an observer of bounded capacities. 
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e Executability: P can be set in motion by a finite observer to generate the values 
of a given function until it generates a readable result. The sub-constraints are: 


e Readable Inputs and Outputs: The inputs and outputs of P must be readable, 
i.e., they can be specified and measured to the desired degree of accuracy. 
e Process-Independent Rule: The function computed by P must be definable 
independently of P (and similarly for the problem solved by P). 
e Repeatability: P can be repeated by any competent finite observer. 
e Settability: S can be reset to its initial state. 
e Physical Constructibility: S can be constructed by arranging the relevant 
physical materials. 
e Automaticity: P must run with no intuitions, ingenuity, invention, or guesses. 
e Uniformity: P does not need to be redesigned or modified for different inputs. 
e Reliability: P generates results at least some of the time and the results are correct. 
The sub-constraints are: 
e S’s components must not break too often. 


e Sis designed so that noise and external disturbances don’t interfere with results. 


According to Piccinini’s classification and constraints, CTT-P is neither bold nor 
modest; indeed, it can be labeled “super-bold”. Therefore, Copeland and Shagrir 
weakened their CTT-P and stated the following modest thesis: 


(Copeland-Shagrir (CTT-P-C)) Every function computed by any physical computing 
system is Turing-computable. 


By CTT-P-C, if a function’s values can be computed by the physical processes 
of some physical system, then it can also be computed by some Turing machine. 
Equivalently, if a function is not Turing-computable, neither is it computable by 
any physical computing system. Here is an example. Since the Halting Problem is 
undecidable,® its characteristic function 


1 if TMT,, halts on n; 


m,n)) = ; 
XKo (( )) {) if TM T,, does not halt on n 

is not Turing-computable; then—if CTT-P-C is true—-neither is it computable by 
any physical computing system. 


To sum up: /f CTT-P-C is true, then Turing computability is the upper bound on 
what CTT-P-C deems physically computable. In other words, if CTT-P-C is true, 
then Turing computability is a computational barrier that cannot be broken, and is 
sometimes called the Church-Turing barrier. 


But is CTT-P-C true? 


6 See Sect. 8.2, p. 180. 
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16.6.5 Hypercomputing? 


While many researchers believe that CTT-P-C is true, several researchers have been 
trying to conceive a physical computing system that would breach the Church- 
Turing barrier. Recently, several such physical computing systems have been pro- 
posed. They are based on very different concepts, but their common intention is to 
squeeze infinitely many computational steps into a finite span of time. For example, 
accelerating machines carry out the next operation in half the time of the previous 
one, and relativistic machines make use of relativistic gravitational time dilation. 
Physical computing systems that would be capable of “computing” beyond the 
Church-Turing barrier are nowadays collectively called hypercomputers, and their 
modes of “computing” are collectively called hypercomputing. If physically real- 
istic, any hypercomputer would falsify CTT-P-C. However, to this day, no pro- 
posed hypercomputer has been proved to be entirely physically realistic; for now, 
hypercomputers remain notional computing systems. 
We describe Németi’s™ relativistic machine from 2006, a hypercomputer conceiv- 
ed to compute the values of the halting function 7x, by using relativistic phenomena. 


Relativistic Machine. Let 7; be a universal Turing machine and 7. an ordinary 
Turing machine that can communicate with each other. (We will explain the mean- 
ing of indexes ¢ and @ shortly.) The pair (T,,7~) is a physical system intended to 
compute the value ¥4c,((m,n)) of the halting function 7c, for any pair (m,n) € N?. 
Here is how the system operates. 


Te has input <m,n> 
Te writes ae (<m,n>) = 0 


<m,n> ‘ 3 
Te sends <m,n> to Tg : Tg starts simulating T,, on n 
Te waits Ty simulates 
te [sec]:of its proper time te [sec] iof its proper time 
. . v . . 
: if 75 terminated in time tg 
¥ < a then 7 sends HALTED to Te 


if Te received HALTED in time fe 
then 7. writes Mig (<m,n>) = 1 
Te halts . 


Fig. 16.2 The Relativistic Machine. Turing machine Te writes provisional result ¥jc,((m,n)) = 0 
and commands the universal Turing machine 7; to simulate 7, on input n. If T, terminates in time t,, 
it signals to Te, which then replaces the provisional result with the final result 7c, ((m,n)) = 1. 
But Te operates in a stronger gravitational field than T;, so Te’s proper time is passing more 
slowly than T;,’s proper time. In effect, J» can wait reasonably long even for an unreasonably long 
simulation 


67 Istvan Németi, b. 1942, Hungarian mathematician and theoretical computer scientist. 
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There is nothing spectacular in this scenario: T, writes the provisional result 
XK ((m,n)) = 0 on its tape and hands over the computation of ¥c,({m,n)) to T., 
which, in turn, starts simulating 7,, on m for at most f, seconds. (The simulation 
may terminate sooner.) Meanwhile, 7, waits at most te >t, seconds for the result of 
the simulation. Thus, 7, learns in at most fg seconds whether or not T,, terminated 
the simulation in at most ¢, seconds. /f it did, then, obviously, 7,, halted on n, so 
T. replaces the provisional result on its tape with the final result Yc, ((m,n)) = 1 
and halts; otherwise, nothing can be said about the final value ¥jc,((m,n)), as the 
simulation might either terminate later (for a larger t,,) or never terminate (even for 
infinite ¢,,). (If the simulation does not terminate in t,, seconds, the system (T,, Te) 
could extend deadlines f, and f,,. But this would only pay off if in truth 7, halts on n; 
otherwise, (7,7) would keep extending te and ¢,, indefinitely and never halt.) 

We tacitly assumed that both 7, and T,, reside on Earth and run under the same 
physical conditions. What if they were on two very different astronomical objects? 


Box 16.1 (Black Holes). 

The General Theory of Relativity predicts that a sufficiently compact mass can deform spacetime 
into an astronomical object called a black hole, a spheroidally bounded region of spacetime exhibit- 
ing gravitational “pull” so strong that nothing can escape from it. The boundary of the region is 
called the event horizon and is the defining feature of the black hole. It is a bubble-like hypersurface 
in spacetime surrounding the region’s interior such that matter and radiation can pass only inward, 
towards the gravitational center of the black hole, and nothing can escape outward from the region. 
The exhibited gravitational “pull” tends to infinity as the distance to the event horizon decreases. 
Why? The black hole’s mass warps spacetime so that escaping paths bend back to the horizon. 

Any object that crosses the event horizon is irreversibly consumed by the black hole. There are 
no warnings; the object cannot detect any difference between the gravitational field of a black hole 
and that of any other spheroidal object of the same mass; neither can it detect any characteristic 
local observables. When it reaches and passes through the event horizon, the object feels nothing 
peculiar; e.g., the time it measures is perfectly smooth. (Of course, as the object approaches the 
event horizon, the gravitational field becomes very strong, and the big difference between the 
gravity forces exerted on the upper and lower parts of the object stretches, or “spaghettifies”, the 
object and can even tear it up. But such tidal forces act in every non-homogeneous gravitational 
field, not just near black holes.) If an event occurs inside the event horizon, no information about 
it can reach an outside observer, making it impossible to determine anything about the event. 

A black hole can form when, at the end of its life cycle, a massive star collapses under its 
own gravitational pull. Once across a certain radius (r*), gravitational collapse to a singularity 
(where size is zero and density is infinite) is inevitable. The newborn black hole can continue to 
grow by absorbing mass from the surrounding stars or merging with other black holes. So, black 
holes range from micro (M < M_) via stellar (M < 10M.) and intermediate-mass (M < 10°M.) to 
supermassive (M< 10!°M.) black holes, where M, Mz, and M, are the black hole, lunar, and solar 
mass, respectively. The General Theory of Relativity predicts four kinds of black holes: While 
every black hole has a mass M, some also have angular momentum J or electric charge Q. The 
simplest, non-rotating (J = 0) and non-charged (Q = 0) kind is called a Schwarzschild black hole. 
Its mass M is so compressed that its radius r is less than the radius r* = 2GyM/c? of its event 
horizon. The other three kinds are called Kerr (J > 0,Q = 0), Reissner-Nordstrém (J = 0,Q > 0), 
and Kerr-Newman (J > 0,Q > 0) black holes. Of these, Kerr black holes are the most relevant to 
realistic astrophysical situations. A Kerr black hole, due to its rotation, has the singularity spread 
out over a ring. This is surrounded by an inner event horizon (r_) that is surrounded by an outer 
event horizon (r,) and that by a stationary limit surface (r,). The region between r, and r, is the 
ergosphere, where a stationary object can approach or move away from the event horizon. 
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Suppose that 7, is near a Schwarzschild®* black hole and 7, is on Earth. 
(We denote the black hole by @ and Earth by ¢.) Putting aside the distance between 
e@ and , which brings communication delay due to the finite speed of electromagnetic 
waves carrying signals, there is a difference in the strengths of their gravitational 
fields. Now comes to the fore Einstein’s General Theory of Relativity, which tells us, 
informally, that a clock in a stronger gravitational field runs more slowly. This 
phenomenon is called gravitational time dilation. Here is how Cheng®° describes it: 


The time unit itself changes in the presence of gravity. The clocks run at different rates when 
situated at different gravitational field points: there is a gravitational time dilation effect. 
The clock at the higher gravitational potential point [where the field is weaker] will run faster. 
Here we are saying that two clocks, even at rest with respect to each other, run at different 
rates if the gravitational fields at their respective locations are different. The observer at a 
higher gravitational potential point [where the field is weaker] sees the lower clock [which 
is in a stronger field] run slow, and the lower observer sees the higher clock run fast. 


The gravitational time dilation is not just a theoretical result; it can be detected, 
measured, and used in reality, explains Cheng: 


The gravitational time dilation effects have been tested directly by comparing the times 
kept by two cesium atomic clocks: one flown in an airplane at high altitude 4 (about 10 km) 
in a holding pattern, for a long time T, over the ground station where the other clock sits. 
[... T]he high altitude clock was found to gain over the ground clock by a time interval of 
At = (gh/c?)t in agreement with the expectation [g is the gravitational field, c is the speed 
of light]. 


In particular, near the black hole @, time passes more slowly than on Earth e. We must 
therefore distinguish between the time measured by the clock Cz at Te (called the 
proper time of T,) and the time measured by the clock C, at T;, (proper time of T.,). 
Whenever C, ticks one second (7, waits one second of its proper time), C,, ticks 
more than one second (7, simulates more than one second of its proper time). 
So, from 7.’s perspective, T., carries out in its proper time t,, more of the simulation 
if Te waits te of its proper time near the black hole (instead of on Earth). 


Can Tg use this phenomenon to find the value 7%, ((m,n))? Let T. be slowly ap- 
proaching the black hole’s event horizon. Simultaneously, the gravitational field is 
getting stronger, 7.’s proper time is slowing down relative to T,,’s proper time, C,,’s 
ticking is speeding up relative to C,’s ticking, and the part of the simulation carried 
out in 7.’s proper time fe is growing with no upper bound. 


Tf in truth Ty, halts on n, then there is a finite time t, in which T,, will terminate the 
simulation and signal HALTED to 7g. (Even if t, is colossal, TJ. will wait for the signal 
only a reasonable proper time fe if Te has sufficiently approached the event horizon.) 
Then 7, turns back, thus making the final result 7c, ((m,n)) = 1 on its tape obtain- 
able by an outside observer. In sum, if in truth Yc, ((m,n)) = 1, then T, can learn 
this in a reasonable proper time te. The observer learns this with a finite time delay. 


68 Karl Schwarzschild, 1873-1916, German physicist and astronomer. 


6 Ta-Pei Cheng, b. 1941, Chinese-American particle physics theorist and author of books on Ein- 
stein’s physics, relativity, and cosmology as well as particle physics. 
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What about if in truth T,, does not halt onn? Can Tg decide that the simulation never 
terminates? Since the simulation doesn’t terminate, 7, receives in its proper time fe 
no signal from 7;, and continues approaching the event horizon with the provisional 
result Yc, ((m,n)) =0 on its tape. But J, cannot detect the exact moments of reach- 
ing and crossing the event horizon since there are no observables characterizing the 
two events (although 7,’s proper time runs smoothly and 7, can measure it). After 
Te has crossed the event horizon, the provisional result 7c, ((m,n)) =0 on its tape 
tacitly but correctly becomes final (although 7, cannot detect when that happens). In 
sum, if in truth Yc, ((m,n)) =0, then T, can obtain it in a reasonable proper time fe. 
But how could T, decide that the provisional result 7c, ((m,n)) = 0 is in fact final? 
This problem is addressed by introducing a time bound, b, on T,’s proper time fe. 
When 7, reaches the event horizon, the gravitational field becomes so strong that, 
from T,’s perspective, T.’s time stops (though it continues from 7,’s perspective). 
Thus, from T,,’s perspective, 7.’s crossing of the event horizon takes infinitely long. 
Suppose that during the simulation T,, rarely but periodically signals SIMULATING 
to T,. Since the whole infinite time span of T.,’s simulation matches some finite time 
span of T,’s waiting, T, will receive all these signals in this finite time span. If Te 
picked a large enough finite b, then—when 7.’s proper time reaches b—T, will de- 
termine that the simulation must be over and the result ¥c, ((m,n)) = 0 is final. 


Let us now focus on the physical realizability of the physical system (T,, Te). 
How can T¢ approach the event horizon and survive spaghettification? The answer is 
offered by Kerr black holes.” These black holes rotate (in contrast to Schwarzschild 
black holes), so the centrifugal forces acting on the synchronously rotating T, reduce 
the destructive effect of tidal forces on it. In addition, the Kerr black hole must be 
large enough, because larger black holes induce weaker tidal forces (due to their 
large radii). Then 7. can peacefully travel towards the event horizon, free from 
danger of being torn up. Németi further elaborated a possible use for the structure of 
Kerr black holes by taking into account the two event horizons and the ergosphere 
between the outer horizon and the stationary limit surface. (The reader can find 
further details in the Bibliographic Notes to this chapter.) 


We conclude this description of Németi’s relativistic machine with a remark. 
When 7, reaches the event horizon, the black hole irrevocably consumes 7, 
together with its final result %c,((m,n)) = 0. After that, nothing, even electromag- 
netic waves carrying this result, can escape through the event horizon. To an exter- 
nal observer the final result ¥jc,((m,n)) = 0 (even if known to Tg) is irretrievable. 
It seems that the described relativistic hypercomputation of the value 7c, ((m,n)) 
does not completely fulfill Piccinini’s constraints about physical computation. 


Nevertheless, Németi’s construction demonstrates that there is a fascinating in- 
terplay between Computability Theory and the General Theory of Relativity that 
might lead to new discoveries about computing. 


70 Roy Kerr, b. 1934, New Zealand physicist who in 1963 found the metric for a rotating black hole. 
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supporting the Church-Turing Thesis was given by Kleene [124, pp. 319-323] and Shoenfield 
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Mendelson [156] renounced the standard view that the Church-Turing Thesis cannot be proved. 
(See also recently Mendelson [157].) Thus, Sieg [222, 225] argued that the Church-Turing The- 
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Copeland [45]. Davis [56] critically discusses various claims made by researchers of hyper- 
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overview of the ideas presented in this chapter is Davis [55]. Copeland and Shagrir [48] present 
fresh interpretations of the positions of Turing and Gédel on computability and the mind. 
Copeland, Posy, and Shagrir [46] is an excellent recent collection of chapters on various as- 
pects of computability written by renowned present-day logicians, mathematicians, computer 
scientists, and philosophers. 

e A recent brief overview of various theses about computability ranging from the original 
Church-Turing Thesis to the present-day versions is Copeland and Shagrir [49]. The formu- 
lation CTT-O of the original Church-Turing Thesis is from [49, p. 68]. 

e Lewis and Papadimitriou’s algorithmic version of the Church-Turing Thesis is from Lewis and 
Papadimitriou [142, p. 223]. Harel’s algorithmic version of the Church-Turing Thesis is from 
Harel and Feldman [96, p. 228]. The development of the theory of algorithms within axiomatic 
set theory is described in Moschovakis [160, 161], and the relation between an implementable 
algorithm and its implementations is given in Moschovakis and Paschalis [162, pp. 87-118]. 
The algorithmic version CTT-A is stated in [49, p. 68]. 

e The Bernstein-Vazirani complexity-theoretic version of the Church-Turing Thesis was given 
special prominence in [19]. The Extended Church-Turing Thesis CTT-E is stated and discussed 
in Aharonov and Vazirani [7]. 

e The physical version of Church-Turing Thesis was formulated in Wolfram [278] and a sim- 
ilar thesis in Deutsch [61]. The Total Physical Computability Thesis and its modest version 
CTT-P-C appeared in Copeland and Shagrir [49]. Undecidability of the problem OUTCOME SE- 
QUENCES was proved in Eisert et al. [66] and that the SPECTRAL GAP problem is undecidable 
was proved in Cubitt et al. [50]. The classification of physical versions of the Church-Turing 
Thesis into the bold and modest ones is described in Piccinini [178]. 

e The accelerating Turing machine and its operation was described in Copeland and Shagrir 
[47]. The relativistic machine was devised in Németi and David [169]. For the background in 
General Theory of Relativity, in particular gravitational time dilation and black hole spacetime 
geometry, we consulted the following excellent sources: Carroll [31], Cheng [34], Guidry [92], 
Hartle [97], Lambourne [137], and Schutz [212]. For a critical account of hypercomputing see 
Davis [56]. 

e Sieg [226] is an in-depth treatise on the historical developments of computability theory that are 
deeply intertwined with metamathematical work in the foundations of mathematics. Sieg [229] 
discusses the philosophical challenge to answer whether there exists a rigorous argument from 
Gédel’s incompleteness theorems to the claim that machines can never replace mathematicians 
(or, more generally, that the human mind infinitely surpasses any finite machine). 

e Anice background on the contemporary varieties and aspects of computation is given in Shagrir 
[213]. Sieg [224] addresses the question of whether there are strictly broader notions of effec- 
tiveness in view of mathematical reasoning that transcends mechanical computing. 


® | 
Chapter 17 Check for 
Further Reading 


C Gree, is that all?) Oh ne no, >, Becky; that ee yas) 
—— tee Ae ee the e beginning gl / 


2, 


— Cantor Hilbert? an Church 
Boole HANS Post 
, = /\ Whitehead na ; 
Euclid Frege | | ss Cleene \ Cantor Rice 


2 Peano | \ Woe 
on | Hilbert Goolell 2 (serra BIND» 
Pascal “A, erouwer “J \\ cédel | esha 
oe | 7S Ss 
e on je 
eloniz QS Zermelo eee ae pest Kleene 
Baboage %- Fraenkel Goode SPost  EvtedberqoMuénile 
f Ma aw \ 


This text has been designed so that it can serve as a stepping stone to a more 
advanced study of Computability Theory, or as an introduction to Computational 
Complexity Theory. Here are some suggestions for further reading. 


Robert I. Soare, Turing Computability: Theory and Applications, Theory and 
Applications of Computability, Springer (2016). 


Soare’s recent monograph emphasizes three important concepts: computability 
(or effective calculability), Turing (or classical) computability, and the art (or math- 
ematical aesthetic) of computability. The monograph is a survey of the results in 
Computability Theory up to the mid-2010s. Through the parts on the foundations of 
computability theory, computably open and closed sets of reals, minimal Turing de- 
grees, and games in computability theory it will bring you closer to the frontiers of 
the current research in this theory. In the last, fifth part you will find a short history 
of computability theory. 
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Robert I. Soare, Recursively Enumerable Sets and Degrees, Springer (1987). 


Soare’s monograph is a concise survey of Computability Theory during the peri- 
ods 1931-1943, 1944-1960, and 1961-1987. It will deepen and broaden the funda- 
mental concepts that you are now acquainted with, and bring you deep into Post’s 
Problem, oracle constructions, and finitary and infinitary methods for constructing 
c.e. sets and degrees. There are many exercises and you will find a lot of information 
there. Proofs will often demand additional work. 


S. Barry Cooper, Computability Theory, Chapman & Hall/CRC Mathematics 
(2004). 


Cooper’s monograph consists of three parts. Being now acquainted with the funda- 
mental concepts of Computability Theory, you should have no problems with the 
first part. But carefully reading it, you will complement your current knowledge and 
view many issues from different perspectives. The second part starts with oracle 
computations, which should be easy for you, and proceeds to topics fundamental to 
Computational Complexity Theory. The third part brings in advanced topics about 
degree structures, forcing, determinacy, and applications to mathematics and sci- 
ence. There are many examples and exercises. 


Rebecca Weber, Computability Theory, Student Mathematical Library, vol. 62, 
American Mathematical Society (2012). 


Weber’s monograph, similarly to Cooper’s, will complement your current knowl- 
edge and present it from other perspectives. The second half of the monograph will 
give you additional explanation of methods, tools, and the arithmetical hierarchy, 
and the last chapter will give you a taste of various areas of Computability Theory 
where research is currently active. 


Hartley Rogers, Theory of Recursive Functions and Effective Computability, 2nd 
ed., MIT Press (1987). 


Rogers’s monograph is a complete and concise presentation of classical Com- 
putability Theory up to 1970 or so. Its central concerns are the theories of com- 
putably enumerable (c.e.) sets, of degrees of unsolvability, and of Turing degrees in 
particular. The subjects are developed in a succinct, mathematically mature manner 
free from the details of any particular formalism. Because of the clear exposition 
at a level accessible to the reader with little training in logic or other special math- 
ematics subjects, the monograph became one of the most influential textbooks on 
Computability Theory in the 1970s and 1980s. There are many exercises and a lot 
of additional information that will augment your current knowledge. 
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Rodney G. Downey and Denis R. Hirschfeldt, Algorithmic Randomness and 
Complexity, Theory and Applications of Computability, Springer (2010). 


Downey-Hirschfeldt’s monograph will introduce you to a research area that has been 
flourishing since the late 1990s. It will explain to you how relative computability, 
information content, and randomness interact. 


André Nies, Computability and Randomness, Oxford Logic Guides, Oxford Uni- 
versity Press (2009). 


Nies’s monograph will explain to you how Computability Theory is used in the study 
of randomness of sets of natural numbers; conversely, it will show you how ideas 
originating from randomness are used to enrich Computability Theory. You will find 
many advanced topics that will extend your current knowledge. 


Piergiorgio Odifreddi, Classical Recursion Theory: The Theory of Functions and 
Sets of Natural Numbers, volume | and I, 2nd ed., Elsevier (1999). 


Odifreddi’s two-volume monograph contains a wealth of information. The author 
has opted for breadth rather than depth so the book provides rudiments of many 
branches of classical Computability Theory. It is a good reference for those with a 
moderate background. 


Appendix A 
Mathematical Background 


In this appendix, we review the basic notions, concepts, and facts of logic, set theory, algebra, 
analysis, and formal-language theory that are used throughout this book. For further details see, 
for example, [106, 155, 196] for logic, [88, 95, 133] for set theory, [64, 115, 179] for algebra, 
[79, 205, 231] for formal languages, and [118, 206, 248] for mathematical analysis. 


Propositional Calculus P 


Syntax 


e An expression of P is a finite sequence of symbols. Each symbol denotes either an individ- 
ual constant or an individual variable, or it is a logic connective or a parenthesis. Individual 
constants are denoted by a,b,c,... (possibly indexed). Individual variables are denoted by 
X,y,Z,... (possibly indexed). Logic connectives are V ,/\,=>,<>, and —; they are called disjunc- 
tion, conjunction, implication, equivalence, and negation, respectively. Punctuation marks are 
parentheses. 

e Not every expression of P is well formed. An expression of P is well formed if it is either 
1) an individual-constant or individual-variable symbol, or 2) one of the expressions FV G, 
FAG, F > G, F &G, and -F, where F and G are well-formed expressions of P. A well-formed 
expression of P is called a sentence. 


Axioms and Rules of Inference 


e IfF,G,H are arbitrary sentences, the following are axioms of P: 


- F=>(G=F) 


(F> (6H) > ((F+6)> (FH) 
(46 + -F) + ((-¢ > F) +6) 


e The only rule of inference is Modus Ponens: G is a direct consequence of F and F => G. 
Semantics 


e The standard meanings of the logic connectives are: “or” (V), “and” (A), “implies” (=>), “if and 
only if” (<=), “not” (-). 

e Let {T,1} be a set. The elements T and | are called logic values and stand for “true” and 
“false”, respectively. Often, 1 and 0 are used instead of T and , respectively. 
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e Any sentence has either the truth-value T or L. A sentence is said to be true if its truth-value 
is T, and false if its truth-value is L. Individual constants and individual variables obtain their 
truth-values by assignment. When logic connectives combine sentences into new sentences, the 
truth-value of the new sentence is determined by the truth-values of its component sentences. 
Specifically, let E and F be sentences. Then: 


— WE is true if E is false, and —E is false if E is true. 

— EVF is false if both E and F are false; otherwise EV F is true. 

— EAF is true if both E and F are true; otherwise E/\ F is false. 

— E= F is false if E is true and F is false; otherwise E => F is true. 

— E<F is true if E and F are either both true or both false; otherwise, E > F is false. 


e The following hold: =(EV F) © (AE) A (AF) and =(EA F) © (AE) V (-F). 


First-Order Logic L 


Syntax 


e Anexpression of Lisa finite sequence of symbols, where each symbol is an individual-constant 
symbol (e.g., a,b,c), an individual-variable symbol (e.g., x, y,z), a logic connective (V, A, =, 
>, 7), a function symbol (e.g., f,g,h), a predicate symbol (e.g., P,Q,R), a quantification sym- 
bol (V, 5), or a punctuation mark (e.g., colon, parenthesis). (Predicates are also called relations.) 

e Weare only interested in the well-formed expressions of L. To define these, we need two def- 
initions. First, a term is either 1) an individual-constant or individual-variable symbol, or 2) a 
function symbol applied to terms (e.g., f(a,x)). Second, an atomic formula is a predicate sym- 
bol applied to terms (e.g., P(y, f(a,x)). Finally, we say that an expression of L is well formed 
if it is either 1) an atomic formula, or 2) one of the expressions F VG, FAG, F > G, F > G, =F, 
VtF, and StF, where F and G are well-formed expressions of L and 7 is an individual-variable 
symbol. A logic expression of L that is well formed is called a formula. 


Axioms and Rules of Inference 


e IfF,G,H are arbitrary formulas, the following are axioms of L: 


- F>(G=>F) 
(F> (G>H)) > ((F>6) => (F>#)) 
(“6 = “F) > ((4G = F) = G) 

— VxF(x) => F(t) if t is a term free for x in F(x) 
Vx(F > G) > (F > VxG) if x is not free in F 


e The rules of inference are: 


— Modus Ponens: G follows from F and F > G. 
— Generalization: /xF follows from F 


Semantics 


e The standard meanings of the quantification symbols are: “for all” (V), “exists” (4). For the 
meanings of logic connectives, see Propositional Calculus P above. 
e The truth-value of a formula is determined as follows. Let E and F be formulas. Then: 


— VF is true if F is true for every possible assignment of a value to T. 
— 4TtF is true if F is true for at least one possible assignment of a value to T. 
— For the truth-values of FV G, FAG, F > G, F = G, -F, see Propositional Calculus P above. 
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Sets 


Basics 


Given any objects a1,...,dn, the set containing a),...,@, as its only elements is denoted by 
{a1,...,Gn}. More generally, given a property P, the set of those elements having the property 
P is written as {x|P(x)}. If an element x is in a set A, we say that x is a member of A and 
write x € A; otherwise, we write x ¢ A and say that x is not a member of A. The set with no 
members is called the empty set and denoted by 90. 

For sets A and B, we say that A is a subset of B, written A C B, if each member of A is also a 
member of B. A set A is a proper subset of B, written A ¢ B, if A C B but there is a member 
of B not in A. Instead of € we also write C. 

Sets A and B are equal, written A = B, if AC Band BC A. 

Given a set A = {x,|1 € Z}, the set Z is called the index set of A. 

By (a1,.-.,dn), or also by (a1,...,dn), we denote the ordered n-tuple of objects aj,...,dn. 
When n = 2, the n-tuple is called an ordered pair. Two ordered n-tuples (a1,...,a,) and 
(b1,...,bn) are equal, denoted by (a1,...,dn) = (b1,...,bn), if a; = bj for i=1,...,n. 


Operations on Sets 


The union of sets A and B, written as AUB, is the set of elements that are members of at least 
one of A and B. 

The intersection of sets A and B, written as A()B, is the set of elements that are members of 
both A and BG. We say that A and B are disjoint if A(\B = 9. 

The difference of sets A and B, written as A — B, is the set of those members of A that are not 
in B. 

If A CB, then the complement of A with respect to B is the set B— A. 

The power set of a set A is the set of all subsets of A and is denoted by 24. 

The Cartesian product of a finite sequence of sets A1,...,An is the set of all ordered n-tuples 
(a1,...,@n), where a; € A; for each i. In this case it is denoted by A; x... x An. If Ay =...= 
An = A, the Cartesian product is denoted by A”. By convention, A! stands for A. 


Relations 


Basics 


An n-ary relation on a set A is a subset of A”. When n = 2 we say that the relation is binary, or 
for short, a relation. If R is a relation, we write xRy to indicate that (x,y) € R. A 1-ary relation 
on A is a subset of A, and is called a property on A. 

A relation R on A is: 


— reflexive if xRx for each x € A. 

— irreflexive if xRx for no x € A. 

— symmetric if xRy implies yRx, for arbitrary x,y € A. 

— asymmetric if xRy implies that not yRx, for arbitrary x,y € A. 

— anti-symmetric if xRy and yRx imply x = y, for arbitrary x,y € A. 
— transitive if xRy and yRz imply xRz, for arbitrary x,y,z € A. 


Ordered Sets 


A preordered set is a pair (A,R) where JA is a set and R a binary relation on A such that (i) R 
is reflexive, and (ii) R is transitive. In this case, we say that R is a preorder on A. Two elements 
x,y € A are incomparable by R (for short, R-incomparable) if neither xRy nor yRx. 
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e A partially ordered set is a pair (A,R) where A is a set and R a binary relation on A such 
that (i) R is reflexive, (ii) R is transitive, and (iii) R is anti-symmetric. In this case, we say that 
R is a partial order on A. A partial order is often denoted by <,<,<, = or any other symbol 
indicating the properties of this order. 

e Let (A,=) be a partially ordered set. The relation < on A is the strict partial order corre- 
sponding to x ifaxb@ax=<bAa¥¢b, for arbitrary a,b € A. We say that ~ is the irreflexive 
reduction of =<. Conversely, = is the reflexive closure of <, sinceaxb@Sa~<bVa=b. 

e Let (A,<) bea partially ordered set and a,b € A. When a = J, we say that a is smaller than or 
equal to (or lower than or equal to) b. Correspondingly, we say that b is larger than or equal 
to (or higher than or equal to) a. When a ~ b, we say that a is smaller than (or lower than, or 
below) b. Correspondingly, we say that b is larger than (or higher than, or above) a. 

e Let (A,<) bea partially ordered set and a,b,c,d € A. Then we say: 


— ais <-minimal if x<a implies x=a for all xe A (nothing in A is smaller than a). 
— bis <-least if b <x forallxe A (b is smaller than any other in A). 
— cis <-maximal if c<x implies x=c for all xe A (nothing in A is greater than c). 
— dis <-greatest if x <d forallxe A (d is greater than any other in A). 


When the relation = is understood, we can drop the prefix ““=-”. The least and greatest elements 
are called the zero (0) and unit (1) element, respectively. 
e Let (A,<) bea partially ordered set, B C A, and u,v,w,z € A. Then we say: 


— wuisa<-upper bound of B if x = u for all x € B. 

— visa <-least upper bound (or <-lub) of B if v is a <-upper bound of 6 and v < u for every 
=<-upper bound u of B. 

— wisa ~<-lower bound of B if w < x for all x € B. 

— zis a ~<-greatest lower bound (or <-glb) of B if z is a <-lower bound of B and w ~ z for 
every <-lower bound w of B. 


When the relation = is understood, we can drop the prefix “<-”. 

e A lattice is a partially ordered set (A,<) in which any two elements have an lub and a glb. The 
lub of a,b € A is denoted by av b, and the glb by ab. An upper semi-lattice is a partially 
ordered set (A, =<) in which any two elements have an lub (but not necessarily a glb). 

e A linearly (or totally) ordered set is a partially ordered set (A,<) such that for all x,y € A 
either x < y or y = x. In this case we say that = is a linear order on A. 

e A well-ordered set is a linearly ordered set (A, <) such that every nonempty subset of A has a 
<-least element. We say that such a = is a well-order on A. 

e Associated with every well-ordered set (A,=<) is the corresponding Principle of Complete 
Mathematical Induction: If P is a property such that, for any b € A, P(b), whenever P(a) 
for all a € A such that a < b, then P(x) for all x € A. When A is infinite, a proof using this 
principle is called a proof by transfinite induction. 


Equivalence Relations 


e Arelation R on A is an equivalence relation if (i) R is reflexive, (ii) R is symmetric, and (iii) R 
is transitive. In this case, the R-equivalence class of a € A is the set {x € A|xRa}. Elements of 
the R-equivalence class of a are said to be R-equivalent to a. If C is an equivalence class, any 
element of C is called a representative of the class C. 

e A partition of A is any collection {.A;|i € Z} of nonempty subsets of A such that (i) A = 
Ujer Ai, and (ii) Ai A; = 9, for all i, 7 € Z with i ¥ j. So, A is the disjoint union of the sets 
in the partition. 

e Any equivalence relation on A is associated with a partition of A, and vice versa. If R is an 
equivalence relation on A, then the associated partition of A is called the quotient set of A 
relative to R and is denoted by A/R. The members of A/R are the R-equivalence classes of A. 
The function f : A — A/R that associates with each element a € A the R-equivalence class of 
ais called the natural map of A relative to R. 

e An equivalence relation is often denoted by ~,~,=, or any other symbol indicating the prop- 
erties of this relation. 
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Functions 


Basics 


A total function f from A into B is a triple (A,B, f) where A and B are nonempty sets and 
for every x € A there is a unique member, denoted by f(x), of B. We call A the domain of f 
and denote it by dom(f). The set B we call the co-domain of f and denote it by codom(f). We 
usually write f : A > B instead of (A,B, f). A function is also called a mapping. 


e In specifying a definition of f : A — B we say that f is well defined if we are assured that f is 
single-valued, i.e., with each member of A, f associates a unique member of B. 

e When the domain of a function consists of ordered n-tuples, the function is said to be of n 
arguments. A (total) function of n arguments on a set S is a function f whose domain is S”. 
We write f(a1,...,dn) instead of f((a1,...,dn)). 

e Let f: AB andC C A. The image of C under f is a set denoted by f(C) and defined by 
f(C) = {f (x) |x €C}. In particular, f(A) is called the range of f and denoted by rng(f). 

e A function f: A> Bis: 

— injective if f(x) 4 f(y) whenever x 4 y; we also say that such an f is an injection. 
— surjective if f(A) = B; we also say that such an f is a surjection. 
— bijective if it is injective and surjective; we also say that such an f is a bijection. 

e Anelement a € A is called a fixed point of a function f : A A if f(a) =a. 

e Let f: A> BandC C A. Then a function g : C > B is the restriction of f to C if g(x) = f(x) 
for each x € C. The restriction of f to C is denoted by f|c. In that case f is an extension of 
gtoA. 

e Let f: AB and g:C + D be functions and f(A) CC. Then the composite function (or 
composition) of f and g, denoted by go f, is the function go f :. A > D defined by (go f)(x) = 
g(f(x)), for each x € A. 

e@ Let A andU be sets, and let A CU. The characteristic function of A is the function 74 :U > 
{0,1} such that ¥4(u) = lifue A and y,(u) =O0ifugA. 

e Let A and B be nonempty sets. Then the set of all functions having the domain A and co- 
domain B is denoted by BA. 

Cardinality 

e Two sets A and B are said to be equinumerous (or equipollent or of the same power), which is 
denoted by A ~ B, if there exists a bijection f : A — B. In that case we say that A and B have 
the same cardinal number. The cardinal number of a set A is denoted by |.A|. The relation ~ 
is an equivalence relation. 

e Acardinal number |.A| is smaller than a cardinal number |B], written |.A| < ||, if there is an 
injection f : A > B, but A and B are not equinumerous. 

© Cantor’s Theorem states that |A| < |24], for any set A. 

e Aset Ais 
— finite if either A= 0 or A~ {1,2,...,n} for some natural n; 

— infinite if it is not finite; 

— countable (or enumerable, or denumerable) if A ~ B for some B C N; when B =N, the set 
A is said to be countably infinite; 

— uncountable if it is not countable. 

e Ifaset A is infinite, then there is 6 ¢ A such that B~ A. 

e Any subset of a countable set is countable. The union of countably many countable sets is 
countable. The Cartesian product of two countable sets is countable. 

e Letn be a natural number, No = |N|, and c = |R| the cardinality of continuum. Then: No +2 = 


No, No + Ro = No, 2: No = No, RG = No, C+ No c, and No-c=c. 
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e A sequence is a function f defined on N, the set of natural numbers. If we write f(7) = xn, for 
n€N, we also denote the sequence f by {x,}, or by x0,%1,X2,... When x, € A for alln € N, 
we say that {x,} is a sequence of elements of A. The elements of any at most countable set can 
be arranged in a sequence. 

e The cardinality of B4, the set of all functions mapping A into B, is |B|!4!. 


Operations and Algebraic Structures 


e Ann-ary operation ona set A is a function x: A” + A. When n = 2, we say that the operation 
is binary. In this case we write ax b instead of «(a,b) When n = 1, the operation is said to be 
unary and we write a* instead of x(a). 

e A binary operation on a set A is: 


— associative if ax (bxc) = (axb)xc, for all a,b,c € A. 
— commutative if axb = bxa, for alla,be A. 


e A semigroup is a pair (A,*), where x is an associative binary operation on A. 
e A group is a semigroup (A,x) satisfying the following requirements: 


— there exists an element e € A such that axe = exa =a, for alla € A (e is called the identity 


of A); 


— foreacha € A there exists an element a~! € A such that axa~! =a7! xa =e (a7! is called 
the inverse of a). 
Natural Numbers 
e Natural numbers are 0,1,2,.... The set of all natural numbers is denoted by N. The cardinal 


number of N is denoted by No (aleph zero). 

e A prime is a natural number greater than | that has no positive divisors other than | and itself. 
There are infinitely many primes. A natural number greater than | that is not a prime is called 
a composite. 

e The Fundamental Theorem of Arithmetic states: Any positive integer (¢ 1) can be expressed as 
a product of primes; this expression is unique except for the order in which the primes occur. 


Thus, any positive integer n(4 1) can be written as py! p5?...p%, where p,po,...,pr are 
primes satisfying p1 < p2 <...< p,, and 0, Q),..., Q@; are positive integers. 


e The Principle of Mathematical Induction is: Any subset of N that contains 0 and, for every 
natural k, contains k + 1 whenever it contains k, is equal to N. 

e The set (N, <) is well-ordered. It is also denoted by o. 

e The Principle of Complete Mathematical Induction: Any subset of @ that, for every natural k, 
contains k whenever it contains every natural i < k is equal to @. 

e The set of all subsets of N, ice., the set 2N, is uncountable. Its cardinality is 280. This is equal 
to c =|R|, the cardinality of continuum. 

e Functions f: NN, k > 1, are called numerical. 

e The set N% of all functions f : N — N is uncountable: |NN| = 280 = c. In particular, the set 
{0,1} of all characteristic functions 7 : N > {0,1} is equinumerous to the set 2%. Since each 
x is identified with an infinite sequence of Os and |s, the set of all infinite binary sequences is 
also uncountable. 

e The join of two sets A,B CN is the set denoted by A B and defined by A@B = {2x|x€ 
A}U {2y+ 1|y € B}. Informally, A B “remembers” every member of A and every member 
of B. 
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Formal Languages 


Basics 


An alphabet & is a finite nonempty set of abstract symbols. 

A word of length k > 0 over the alphabet & is a finite sequence x;,...,x, of symbols in XY. A 
word x1,...,x, is usually written without commas, ie., as x1 ...X,. 

The length of a word w is denoted by |w|. The word of length zero is called the empty word and 
denoted by €. 

If w =x, ...xg is a word, then the word w* = x, ...x, is called the reversal of w. 

Two words x,...x; and y,...ys over the alphabet Y are equal, written x; ...x, = y1...¥s, If 
r=s and x; = y; for eachi. 

Let x and y be words over the alphabet Y. The word x is a subword of y if y = uxv for some 
words u and v. The word x is a proper subword of y if x is a subword of y, but x F y. 

Let x and y be words over the alphabet 2. The word x is a prefix of y, written x C y, if y = xv 
for some word v. The word x is a proper prefix of y, written x C y, if x is a prefix of y, butx Fy. 
The set of all words, including €, over the alphabet X is denoted by &*. 

The set 2* is countably infinite. 

Each subset £ C &* is called a formal language (or language for short). 


Operations on Languages 


If x =x, ...x, and y;...y, are words, then xy, called the concatenation of x and y, is the word 
X]..XrY1 ++ Vs- 

For languages £; and £5, the concatenation (or product) of £; and £2 is a language denoted 
by £; £2 and defined by £1L2 = {xy|x E Li Aye Lo}. 

For a language C let Lo= {e} and, for each n > 1, let £2”? = L"-!£. The Kleene star of L is 
the language denoted by L* and defined by L* = U7. L!. Similarly, the Kleene plus of L is 
the language denoted by £+ and defined by £+ = U2, L’. In particular, for the alphabet Z, 
the language ©” contains all words of length n over Y, and X* contains all words over L. 


Orders on Languages 


Let < be a linear order on the alphabet Y. A lexicographic order <jex on £", induced by <, 
is the order in which x; ...%» <jex ¥1---Yn if there is a 7,1 << j <n, such that x; = y; for each 
i=1,...,j—1, butx; <y;. 

A shortlex order on a language £ C &* is the order in which words of £ are primarily or- 
dered by increasing length, and words of the same length are then lexicographically ordered. 
The shortlex order is a well-order on X* and, consequently, on L. 


Appendix B 
Notation Index 


Frontmatter and Chapter 2 


Box detour 

NB nota bene 

A,B,C,... sets 

xeA x is amember of A 

ACB A is a subset of B 

A the complement of A 

AUB union of A and B 

ANB intersection of A and B 

A-B set-theoretic difference of A and B 
24 power set of A 

(x,y) ordered pair where x is the first and y the second member 
AxB Cartesian product of A and B 

|A| cardinality of A 

N the set of all natural numbers 

No |N|, the least transfinite cardinal 

N transfinite cardinal 


< is less than or equal to (used for numbers) 

a) well-ordered set (N, <), the least transfinite ordinal 

R the set of all real numbers 

c |IR|, the cardinality of continuum 

iff if and only if 

Q set of all ordinal numbers (paradoxical) 

Uu set of all sets (paradoxical) 

R Russell’s set of all sets not containing themselves (paradoxical) 
Z the set of all integers 


PM Principia Mathematica 

Chapter 3 

f.a.s. formal axiomatic system 

F a particular f.a.s., the theory developed in this f.a.s. 
F meta-theory, the theory about F 

a,b,c,... symbols for individual constants in an f.a.s. 

XY pZjpees symbols for individual variables in an f.a.s. 
f,g,h,... symbols for functions in an f.a.s. 
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B Notation Index 


symbols for predicates in an f.a.s. 32 
symbols for formulas in an f.a.s. 32 
and 32, 363 
or 32, 363 
not 32, 363 
implies 32, 363 
is equivalent to 32, 363 
there exists 32, 364 
for all 32, 364 
rule of inference 33 
Modus Ponens 33 
Generalization 33 
F is derivable (formally provable) from the set of premises P 34 
F is a theorem of F, i.e., derivable (formally provable) in F 34 
natural language, object language 37 
metalanguage of the language 2 37 
restricted language, formalized language 38 
metalanguage of the language 38 
mathematical structure 40 
mapping that assigns meaning to F in .Y 41 
F is a logical consequence of the set of premises P 43 
F is valid in F 44 
Propositional Calculus 41, 363 
First-Order Logic 44, 364 
Formal Arithmetic 46 
Zermelo-Fraenkel axiomatic set theory 47, 49 
ZF with Axiom of Choice 47 
Von Neumann-Bernays-Gédel’s axiomatic set theory 48, 50 
formal axiomatic system for all mathematics 59, 60 
decision procedure for M 59, 61 
Gédel number of syntactic object X 65 
Gédel’s formula 66 
Continuum Hypothesis 65 
Generalized Continuum Hypothesis 65 
total functions from N* to N 80 
zero function 81, 82 
successor function 81, 82 
projection function 81, 82 
expression containing variable x 85 
[-operation, the least x such that [...x...] =O and [...z...]| forz<x 81, 83 
system of equations defining a function f 84 
partial function of x, defined by [...x...] 85, 86 
a-conversion, renaming of variables in a A-term 86 
B-contraction, application of a A-term 86 
B-reduction 86 
sequence (composition) of B-reductions 86 
B-normal form 86 
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Turing machine 

Turing program 

input alphabet 

production 

code of an abstract computing machine M 
is formalized by 

Computability Thesis, 1.e., Church-Turing Thesis 
partial functions 

@(x) is defined 

@(x) is defined and equal to y 

@(x) is undefined 

domain of @ 

range of @ 

equality of partial functions 

partial computable 


Turing machine 

a TM (basic model) 

the TM with index (code number) n 
tape alphabet 

empty space 

input alphabet 

the set of all words over © 

set of states 

initial state 

set of final states 

Turing program 

a Turing program 

the TP with index (code number) n 
a final state 

a non-final state 

matrix describing 6 

a Turing machine (generalized model) 
code of T 

universal Turing machine 
operating system 

random access machine 

k-ary proper function of T 

the k-ary proper function of 7; 
domain of go” (x) 

empty word 

generator of A 

computably enumerable 

proper set of T, i.e., language of T 
universe, a large enough set 
characteristic function of A 
decider of A 

recognizer of A 

the set of prime numbers 
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ind(@) 
ind(A) 


Chapter 8 
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o-T™ 
T* 
(T*) 


T; 
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index set of a p.c. function @ 
index set of ac.e. set A 


decision problem 

code of an instance d of a decision problem 

the set of codes of all instances of a decision problem ‘D 
language of decision problem D 

Halting problem, “Does T halt on w?” 

Halting problem, “Does T halt on (T)?” 

universal language 

diagonal language 

complementary problem of a decision problem ‘D 
non-Halting problem, “Does T never halt on w?” 
non-Halting problem, “Does T never halt on (T)?” 
empty proper set problem, “Is L(T) = @?” 

n-state Busy Beaver 

Busy Beaver Problem, “Is T a Busy Beaver?” 

Post’s Correspondence Problem 

Post’s Correspondence Problem 

context-free grammar 

context-free language 

“Ts dom(@) empty?” 

“Ts dom(@) finite?” 

“Ts dom(@) infinite?” 

“Is A—dom(@) finite?” (where g : A > B) 

“Is @ total?” 

“Can @ be extended to a total computable function?” 
“Is @ surjective?” 

semi-Thue system with a set 7 of rules over Y 
transformation in a semi-Thue system 

sequence (composition) of transformations in a semi-Thue system 


switching function 

is m-reducible to (set, decision problem, function) 
is 1-reducible to (set, decision problem, function) 
Entscheidungsproblem 

class of all c.e. sets 


oracle Turing machine 

an o-TM (with no oracle set attached) 

the code of T* 

the o-TM with index i (and no oracle set attached) 


158 
158 


177 
177 
177 
178 
181 
181 
181 
181 
185 
184 
184 
187 
187 
187 
189 
189 
191 
191 
193 
193 
193 
193 
193 
193 
193 
195 
195 
195 


205 
211 
215 
217 
222 


238 
240 
241 
241 


B Notation Index 


Chapter 11 


<r 
=r 

at i 
deg(S) 
< 


Chapter 12 


ucone(d) 
Icone(d) 


Chapter 14 


Son 
Sut 
xeElaA 
xZlA 


o-TM with oracle set O attached 

an O-TM 

the O-TM with index i 

oracle Turing program 

an o-TP (i.e., a transition function of an o-TM) 

the o-TP with index i 

without loss of generality 

proper functional of the o-TM 7;* (with arity understood) 
proper functional of the O-TM 7,° (with arity understood) 
partial O-computable (function) 

O-semi-decidable (set) 

index set of the O-p.c. function @ 

index set of the O-c.e. set S 


is T-reducible to (set, decision problem, function) 

is T-equivalent to (set, decision problem, function) 
is <r but not =7 to (set, decision problem, function) 
T-degree (degree of unsolvability) of the set S 

is lower than (T-degree) 


the 7-jump of the set S 
the same as S’ 
the n-th T-jump of the set S 


the class of all T-degrees 

T-degrees 

is lower than or equal to (T-degree) 
the T-jump of d 

the n-th T-jump of d 

the 7-degree of the set 0 

the n-th T-jump of 0 

the word x is a prefix of the word y 
the word x is a proper prefix of the word y 
the least upper bound of T-degrees 
the greatest lower bound of T-degrees 
join of sets A and B 

upper cone of d 

lower cone of d 


is bounded truth-table-reducible to (set, decision problem, function) 
is truth-table-reducible to (set, decision problem, function) 

add (enumerate) x into the set A 

ban x from the set A 
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Ln 
Th, 
An 
graph(@) 


Chapter 16 
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arithmetical class 
arithmetical class 
arithmetical class 
graph of o 


the informal class of all (intuitively) “computable” functions 

the class of all Church’s A-definable functions 

the assignment operation, i.e., ‘let ... obtain the value of ...’ 

the informal class of all “computable” functions of positive integers 
the class of all Church’s A-definable functions of positive integers 

the class of all Gédel’s (general) recursive functions of positive integers 
the class of all Kleene’s [1-recursive functions of positive integers 

the class of all Post’s finite-combinatory-process computable functions 
the set of all Turing-computable numbers 

the set of all (intuitively) “computable” numbers 

the class of all Turing-computable functions 


justified (informally proved) inclusion 


justified (informally proved) equality 

the class of all Turing-computable functions of positive integers 
a black hole 

a Turing machine approaching a black hole 
proper time on a black hole 

the Earth 

a universal Turing machine stationed on Earth 
proper time on Earth 

the Sun 

the solar mass 

the lunar mass 
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338 
355 
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355 
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Glossary 


A 


abstract computing machine an instance of a model of computation, e.g., a partic- 
ular Turing machine, or a particular Post machine, or a particular Markov grammar 


abstraction (in A-calculus) an operation in A-calculus that binds a free variable in 
a A-term and thus constructs a new A-term that denotes a function of that variable 


accelerating machine a hypercomputer that carries out each next basic operation 
(called by the program) in half the time taken to carry out the previous one 


Ackermann function a very rapidly growing function that is defined inductively 
on pairs of natural numbers and is [-recursive but not primitive recursive 


algorithm (intuitively) a finite list of precisely described unambiguous instructions 
that is supposed to be applied and mechanically followed through to a conclusion in 
order to accomplish a specified computation or other task 


algorithm (formally) a Turing program (or a construction of a l-recursive function; 
or asystem &(f) of equations; or a A-term; or a Post program; or a Markov grammar) 


alphabet a finite, nonempty set whose elements are referred to as symbols 


arithmetical class (of sets) a class of arithmetical sets characterized by predicates 
with n alternating quantification symbols and the same first quantification symbol; if 
the first quantification symbol is 5 (V), the class is ,, (IT,,); in addition, A, = Ln 


arithmetical hierarchy (of arithmetical classes) a hierarchy consisting of the arith- 
metical classes X,,, IT, and A, for n = 0,1,2,..., and inclusions between them 


arithmetical relation a relation defined on the set of natural numbers; that is, a 
subset of N*, for some k 
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arithmetical set a set of natural numbers x characterized by a predicate of the form 


Ay Vy25y3 .-. OynR(X,Y1,¥2;+--,Yn) or Vyisy2Vy3...OynR(%,1,2,-++,Yn), where 
n > 0 and Ris a decidable arithmetical relation 


arithmetization (of a theory) the treatment of the theory by methods involving 
only the fundamental concepts and operations of arithmetic, such as natural and 
prime numbers and their sums and products 


axiom (of a theory) a statement whose truth is either to be taken as self-evident or 
to be assumed 


axiomatic method a method of developing a theory by first choosing a set of basic 
notions and a set of axioms about these notions, and then discovering their conse- 
quences (by deducing new theorems only from axioms or previously deduced theo- 
rems, and defining new notions only using the basic or previously defined notions) 


axiomatic set theory a theory of sets developed by the axiomatic method from 
some basic notions that include those of the “set” and the “membership” relation, 
and some axioms that are consistent and in accord with intuitive beliefs about sets 


axiomatic system a collection of basic notions and axioms about the basic notions 


axiomatizable theory a theory for which there is an algorithm that can decide, for 
any formula, whether or not the formula is an axiom of the theory 


B 


behavior (of a mechanism) mechanisms may differ in their local behavior (when 
their moves are governed by different mechanical rules) but they may still be equal 
in their global behavior (i.e., they produce equal results) 


Burali-Forti’s paradox the unacceptable conclusion derived in Cantor’s set theory 
stating that there exists a well-ordered set (namely the set of a// ordinal numbers) 
whose ordinal number is larger than itself 


busy beaver (v-state) a Turing machine with n states that writes the largest number 
of 1s that any n-state Turing machine can write and still halt 


busy beaver function the function whose value at n is the number of Is written by 
an n-state busy beaver 


busy beaver problem the problem of deciding, for an arbitrary Turing machine, 
whether or not the machine is an n-state busy beaver for some 1 


Cc 


canonical system a generator consisting of a start symbol S, an alphabet Y, and a 
finite set of productions that can transform S through various sequences of produc- 
tion applications into various words in Y* (and thus generate a set of words over Y) 
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Cantor’s paradox the unacceptable conclusion derived in Cantor’s set theory 
stating that there exists a set (namely the set of all sets) whose cardinality both 
is and is not strictly less than the cardinality of its power set 


Cantor’s theorem the theorem stating that the cardinality of a set is strictly less 
than the cardinality of its power set 


c.e. degree see computably enumerable degree 
c.e. set see computably enumerable set 


characteristic function (of a set) a function whose value is 1 for every member of 
the set, and O otherwise 


characterization (of the notion of algorithm) the process of searching for (or the 
state after finding) a property that is shared by all algorithms and algorithms only 


Church-Turing barrier the asserted fundamental logical limit on what can be 
computed no matter how far and in what multitude of ways computers develop; 
synonymous with Turing-computability 


Church-Turing thesis see computability thesis 


class (generally) a generalization of the notion of set specifying that every set is a 
class, but some classes, called proper classes, are not sets 


class D (of all degrees of unsolvability) the class of all Turing degrees 


coding function (for a problem) a function that transforms every instance of a com- 
putational problem, which is to be solved on a machine, into a form understandable 
by the machine (namely into a word over the input alphabet of the machine) 


compactness theorem the theorem stating that a first-order theory has a model if 
every finite part of the theory does 


completeness requirement (in defining “computable” functions) the requirement 
that any formalization of the informal notion of the intuitively computable function 
must include all such functions and nothing else 


complexity (of an algorithm) an asymptotic estimation of the amount of computa- 
tional resources needed for a complete execution of the algorithm; usually expressed 
in terms of the size of the input to the algorithm 


computability theory the theory that classifies computational problems according 
to whether or not they can be algorithmically solved at least in principle, i.e., their 
solving has at its disposal unrestricted computational resources, e.g., time and space 


computability thesis the assertion that the intuitive notion of what is computable 
is adequately formalized by the notion of computable by the Turing machine (or, 
equivalently, by any model of computation equivalent to the Turing machine) 


computable function a total function g : A — B for which there exists a Turing 
machine that can compute the value @(x) for any argument x € A 
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computable problem a computational problem for which there exists a Turing 
machine capable of computing the solution to any instance of the problem, if the 
solution is defined 


computable set see decidable set 
computably enumerable (or c.e.) degree a Turing degree containing some c.e. set 


computably enumerable (or c.e.) set a set whose members can be listed (enumer- 
ated) by a Turing machine (which may run forever, if necessary) 


computation intuitively, a sequence of steps that a human or a device makes while 
following a finite list of instructions that tell how to solve a problem; formally, a 
sequence of steps that a Turing machine makes while following its Turing program 


computational complexity theory a theory that classifies computational problems 
according to whether or not they can be solved with appropriately restricted compu- 
tational resources, e.g., within appropriately bounded time or space 


computational problem a problem whose instances require certain computations 
to yield their solutions; it can be a decision, search, counting, or generating problem 


configuration (internal, of an abstract computing machine) the collection of sta- 
tuses of the relevant components of the machine at a particular step of computation; 
informally, a snapshot of the machine at a particular point of its computation 


consistency problem the problem asking whether or not a given theory is consistent 


consistent (theory) a theory that does not contain a contradiction, i.e., there is no 
formula such that both the formula and its negation are derivable in the theory 


continuum hypothesis the assertion that there exists no set whose cardinality is 
strictly between that of the integers and the real numbers 


counting problem a computational problem whose each instance asks for the 
number of the members of a given set that have a given property 


creative set ac.e.set C for which there is a p.c. function @ such that, for any c.e. set 
W,, the value @(x) witnesses that C 4 W,; informally, C is effectively undecidable 


D 
decidability problem the problem asking whether or not a given theory is decidable 


decidable problem a decision problem for which there is a Turing machine that 
can compute, for any instance of the problem, whether the answer to the instance is 
YES or NO 


decidable property (of p.c. functions) an intrinsic property of functions for which 
there is a Turing machine that can decide, for any p.c. function, whether or not the 
function has the property 
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decidable property (of c.e.sets) an intrinsic property of sets for which there is 
a Turing machine that can decide, for any c.e. set, whether or not the set has the 


property 


decidable relation (k-ary, on a set S) a subset R of S* for which there is a Turing 
machine that can decide, for any (a,...,ax) € S*, whether or not (a1,..-,4,)ER 


decidable set a set for which there is a Turing machine that can decide, for any 
element, whether or not the element is a member of the set 


decidable theory a consistent and syntactically complete theory for which there 
exists a Turing machine capable of answering—in finite time, and for any formula 
F of the theory—the question “Is F derivable in the theory?” 


decision problem acomputational problem whose each instance asks for an answer 
that is either YES or NO 


decider (of a set) a Turing machine capable of deciding, for any element, whether 
or not the element is a member of the set 


deduction the process of reaching a conclusion about something because of other 
things, called premises, that we know or assume to be true 


degree of unsolvability (of an unsolvable problem) an informal notion that rep- 
resents our intuitive understanding of the hardness of the unsolvable problem; 
for decision problems it is formalized by the concept of Turing degree 


density theorem the theorem stating that between any two c.e. degrees there exists 
a third c.e. degree 


derivation (of a word v from a word u) a finite sequence of substitutions that trans- 
form u into v while complying with a given set of rules (productions) 


derivation (of a theorem F in a theory) a finite sequence of formulas whose last 
formula is F and each formula in the sequence is either an axiom of the theory or di- 
rectly follows from some of the preceding formulas by one of the rules of inference 


diagonalization a technique for proving that a set S cannot be exhibited by listing 
the elements of J C S, in which 7 is represented by a table whose rows represent 
members of 7 and whose appropriately changed diagonal would represent a mem- 
ber of S yet differ from every row, and hence represent none of the members of 7 


diagonal language the language K of the problem of deciding, for an arbitrary Tur- 
ing machine T, whether or not T halts on its code, i.e., K = {(T,T) |T halts on (T)} 


A,-complete set a A,-set such that every A,-set is m-reducible to it 
An-set a set of natural numbers that is both a 2,,-set and a IT,,-set 


domain (of interpretation) a structure in which a theory developed in a formal ax- 
iomatic system is interpreted (given meaning); a field of interest in which formulas 
of a formal theory become true or false statements about the objects of the field 
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dovetailing a technique that avoids getting trapped in any of countably many 
potentially non-halting computations by executing them simultaneously, i.e., by 
periodically making small progress in each of the computations (or in each of a 
growing subset of them) 


E 


effective procedure (for calculating a function’s values) a finite set of instructions 
in any language that, given any input in the domain of the function, completes in a 
finite number of mechanically executed steps and returns the function’s value 


effectively calculable (function) a function whose values can be computed by an 
effective procedure; that is, a “computable” (intuitively computable) function 


effectiveness requirement (in defining “computable” functions) the requirement 
that any formalization of the informal notion of the intuitively computable function 
must exhibit, for each such function, an effective procedure for calculating its values 


elementary function a function constructed from finitely many exponentials e), 
logarithms log(-), roots “/ (-), real constants, and the variable x, by using finitely 
many function compositions and operations +, —, x, and + 


Entscheidungsproblem the decidability problem of mathematics, i.e., the question 
asking whether or not there is an algorithm that can decide, for any formula of a the- 
ory formalizing mathematics, whether or not the formula is derivable in the theory 


enumeration function (of a set S) a computable function g : N > S such that 
{g(i) |i @ N} =S; if such a g exists, then S is enumerated by g, in the sense that an 
x € S is taken to be nth in S iff n is the smallest i € N for which x = g(i) 


enumeration problem see generation problem 


equivalent models (of computation) models of computation that have the same 
global behavior, e.g., accept the same language 


equivalent grammars grammars that generate the same language 


extensional definition a definition that gives the meaning of a term by listing ev- 
erything that falls under that term 


F 


finitism the kind of mathematical reasoning which—in order to avoid deceptive 
intuition and doubts arising from the use of the notion of infinity—preferably uses 
finite objects and finite methods that are constructive, at least in principle 


finite-injury priority method a priority method in which a requirement can only 
be injured finitely many times 
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first incompleteness theorem (Gédel’s) the theorem stating that if the Formal 
Arithmetic is consistent, then it is semantically incomplete, i.e., incapable of proving 
all Truths about the natural numbers 


first incompleteness theorem (generalized) the theorem stating that any consistent 
extension of the set of axioms of Formal Arithmetic induces a semantically incom- 
plete theory, i.e., a theory incapable of proving all Truths about the natural numbers 


fixed-point theorem see recursion theorem 


formal arithmetic the formal axiomatic system and the induced theory that for- 
malize the arithmetic of natural numbers 


formal axiomatic system a system consisting of a symbolic language (for writ- 
ing formulas), a set of axioms (selected formulas), and a set of rules of inference 
(specifying the conditions under which formulas can be derived from other formulas) 


formalism the treatment of mathematics that replaces contentual! mathematics 
by mechanical manipulation of meaningless symbols in accordance with accepted 
string manipulation rules 


formalization of computation a definition that formally defines, in terms of a 
model of computation, the basic intuitive notions of algorithmic computation, 
i.e., the algorithm, the required environment, and the execution of the algorithm in it 


formal theory a set of formulas in the symbolic language of a formal axiomatic 
system such that any formula derivable from the axioms of the system by the rules 
of inference of the system is called a theorem of the theory 


foundations of mathematics the study of the philosophical and logical basis of 
mathematics; in a broader sense, the mathematical investigation of what underlies 
the philosophical theories concerning the nature of mathematics 


function application (in the A-calculus) an operation in the A-calculus that sub- 
stitutes every bound occurrence of a variable in one A-term (denoting a function of 
that variable) with another A-term (denoting the argument of the function) 


G 


(general) recursive function a function that can be well defined by a system of 
equations in standard form, and whose values can be computed from the system by 
using only the rules of substitution and replacement 


generalized padding lemma the lemma stating that an O-p.c. function has count- 
ably infinitely many indexes, and given one of them, countably infinitely many oth- 
ers can be generated 


' relating to content 
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generator (of a set) a model that formally defines the intuitive concept of the algo- 
rithmic generation of a set, i.e., algorithmic listing of all the elements of a set 


generation problem a computational problem whose each instance asks for a list 
(i.e., enumeration) of the members of a given set that have a given property; a listed 
member is labeled with n iff the first occurrence of the member is nth in order 


global behavior (of an abstract computing machine) the results that the machine 
can or cannot produce (regardless of its workings, i.e., local behavior) 


Godel numbering (of a formal theory) a computable injective function that maps 
the syntactic objects of the theory (symbols, formulas, finite sequences of formulas) 
to the natural numbers, such that there is an algorithm that, for any natural n, decides 
whether n is the image of some syntactic object, and if so, identifies the object 


grammar a quadruple consisting of a set of symbols called nonterminals, a disjoint 
set of symbols called terminals, a designated nonterminal called the start symbol, 
and a finite set of rules called productions for transforming words into other words 


H 


halting problem the problem of deciding, for an arbitrary Turing machine T and 
an arbitrary input x, whether or not T halts on x 


Hilbert’s tenth problem the problem of deciding, for an arbitrary multivariate 
polynomial equation p(x1,...,x,) = 0, whether or not the equation is solvable in 
the integers 


Hilbert’s program (for mathematics) a formalistic attempt that would use formal 
axiomatic systems to eliminate all known and unknown paradoxes from mathematics 


hypercomputer a notional computing machine able to compute beyond the Church- 
Turing barrier, i.e., beyond Turing-computability 


hypercomputation any mode of computation that goes beyond what is permitted 
by the Church-Turing barrier, i.e., is not bounded by Turing-computability 


I 


incomputable function a function for which there is no Turing machine capable 
of computing the function’s values everywhere the function is defined 


incomputable problem a computational problem for which there exists no Turing 
machine capable of computing the solution to every instance of the problem where 
the solution is defined 


incomputable set see undecidable set 
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index (of a Turing machine) a natural number that encodes a Turing machine; 
informally, a number that represents an algorithm 


index (of a p.c. function) the index of a Turing machine that can compute the values 
of the function 


index set (of a p.c. function) the set of indexes of all Turing machines capable of 
computing the values of the function (anywhere the function is defined); informally, 
the set of all the (encoded) algorithms that compute the function’s values 


index set (of a semi-decidable set) the set of indexes of all Turing machines that 
recognize the set; informally, the set of all the (encoded) recognizers of the set 


index set (of an O-p.c. function) the set of indexes of all O-TMs that can compute 
the values of the function (whenever defined); informally, the set of all the (encoded) 
algorithms that compute the function’s values with the help of the oracle for the set O 


injury the change of the status of a satisfied requirement R back to unsatisfied, 
because a later action A’ (taken to satisfy some unsatisfied requirement R’) conflicted 
with the preservation of the previous action A (taken to satisfy the requirement R) 


intended model (of a theory) a particular structure (i.e., field of interest) for the in- 
vestigation of which a formal axiomatic system is established and theory developed 


intensional definition a definition that gives the meaning of a term by specifying 
necessary and sufficient conditions for when the term should be used. In the case of 
nouns, this is equivalent to specifying the properties that an object needs to have in 
order to be counted as a referent of the term 


internal configuration see configuration 


interpretation (of a formal theory) a mapping that gives the formal theory meaning 
in a field of interest, i.e., defines, for every closed formula of the theory, how the 
formula is to be understood as a statement about the objects of the field 


intrinsic property (of p.c. functions) an essential property of functions that is in- 
sensitive to the algorithm, machine, and program that compute the functions’ values 


intrinsic property (of c.e. sets) an essential property of sets that is insensitive to 
the algorithm, machine, and program solving the membership problem for the sets 


intuitionism a mathematical school that argued for greater mathematical rigor in 
the process of proving; it advocated a (non-Platonic) view that the existence of a 
mathematical object is closely connected to the existence of its mental construction 


jump see Turing jump operator 


jump hierarchy (of sets) the hierarchy 0 <p... <7 00 <7 @()) <7 ..., where 
the set 0 is the ith Turing jump of the decidable set 0 
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jump hierarchy (of Turing degrees) the hierarchy 0 <...<00 <0) <..,, 
where Turing degree 0 is the ith Turing jump of 0, the Turing degree of the 
decidable sets 


L 


A-calculus a model of computation that transforms, via a sequence of reductions, 
a given initial A-term, which represents a function and its arguments, into a final 
A-term, which represents the corresponding value of the function 


A-definable function a function f of one positive integer for which there exists a 
A-term F such that if f(m) =n and M and N are A-terms denoting m and n, respec- 
tively, then A-term FM can be transformed into N with a sequence of reductions 


A-term a well-formed expression in A-calculus built from variables and other well- 
formed expressions by finitely many abstractions and function applications 


language (formal) a set of finite words consisting of symbols from an alphabet 
language (accepted by a Turing machine) see proper set 


language (generated by a grammar G) the set of all words that consist of G’s 
terminals only and can be derived from G’s start symbol by using G’s productions 


language (of a decision problem) the set of the codes of all the positive instances 
of the problem 


liar paradox the sentence “This sentence is false”, which complies with syntactic 
and semantic rules yet cannot consistently be assigned a truth-value because either 
of the premises—the sentence is true; the sentence is false—implies its own negation 


local behavior (of an abstract computing machine) the way in which the machine 
operates when its basic instructions are performed 


logical axiom an axiom that epitomizes a principle of pure logical reflection and is 
therefore present in every axiomatic system 


logically valid formula a formula of a theory that is valid under every interpreta- 
tion of the theory 


logicism a school of mathematics that aimed to found mathematics on pure logic; 
as a side-effect it developed a symbolic language of mathematics, concisely formula- 
ted its rules of inference, and thus gave mathematics concise and precise expression 


Loéwenheim-Skolem theorem the theorem stating that if a theory has a model, 
then it has a countable model 
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M 


Markov algorithm a finite sequence Q — B1,...,Q& — Bn of productions that 
transform a given input word via a sequence of intermediate words into some output 
word by always applying the first applicable production to the last intermediate word 


Markoy-computable function a function for which there is a Markov algorithm 
that can compute the values of the function anywhere the function is defined 


m-complete (set) ac.e. set such that every c.e. set is m-reducible to it 


mechanism a device with predictable local behavior, in the sense that each move 
of the device is governed by some mechanical rule 


membership problem (for a set) the problem of deciding, for an arbitrary element, 
whether or not the element is in the set 


metalanguage a language used to describe or analyze another language (called the 
object language), that is, to make statements about statements of the object language 


metamathematics the study of mathematics itself using mathematical methods, i.e., 
the field of study that deals with the structure and formal properties of mathematics 


metatheory a theory whose subject matter is some theory; in mathematics, for ex- 
ample, a metatheory is a mathematical theory about some other mathematical theory 


model (of a theory) an interpretation of the theory under which all the axioms of 
the theory are valid; intuitively, a field of interest that the theory sensibly formalizes 


model (of computation) a definition that formally describes and characterizes the 
basic notions of algorithmic computation: what the algorithm is; what the environ- 
ment capable of executing algorithms is; and what computation is 


l-operation see unbounded minimization 


mortal matrix problem the undecidable decision problem asking whether or not 
the matrices of a given finite set of square matrices can be multiplied in some order, 
possibly with repetitions, so that the product is the zero matrix 


-recursive function a (partial) function that can be constructed from the functions 
€(n) =0 (zero), o(n) =n+1 (successor), and mk (x1,.--,X¢) =; (projection) using 
the operations of composition, primitive recursion, and L-operation 


m-reduction (between decision problems) a transformation of a decision problem 
P into a decision problem Q such that the positive (resp. negative) instances of P 
transform into the positive (resp. negative) instances of Q 


m-reduction (between sets) a mapping of a set A to a set B such that the members 
(resp. non-members) of A are mapped into the members (resp. non-members) of 6 
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N 


negative instance (of a decision problem) an instance of the decision problem 
whose answer is NO 


nondeterministic Turing machine a Turing machine whose program defines, for 
each pair (state, symbol), a finite set of alternative moves out of which the machine 
guesses the one leading to a successful termination of the computation, if such exists 


nondiamond theorem the theorem stating that there exists no pair of c.e. degrees 
with greatest lower bound 0 (the Turing degree of decidable problems) and least 
upper bound 0’ (the Turing degree of the complete semi-decidable problems) 


non-logical axiom see proper axiom 


non-proper class _a class that is also a set (being a set, such a class is a member of 
its power set, but this is a class by definition) 


nonterminal (in a grammar) a symbol of the grammar that can be replaced by the 
right-hand side of some production of the grammar 


normal system a canonical system whose productions are simplified in a particular 
way while retaining their generating power 


O 


object language a language which is the object of study, i.e., whose statements are 
described and analyzed in some metalanguage 


O-computable function a total function g@ :.A > BG for which there exists an oracle 
Turing machine with oracle set O that can compute the value g(x) for any x € A 


O-decidable set a set whose characteristic function is O-computable 


O-incomputable function a function for which there is no O-TM that can compute 
the function’s values everywhere the function is defined 


O-p.c. function see partial O-computable function 


oracle (for a set) a miraculous and unspecified means that can immediately decide, 
for any element, whether or not the element is in the set (the so-called oracle set) 


oracle set a set whose membership problem is assumed to be decidable by an oracle 


oracle Turing machine a Turing machine with specified oracle Turing program 
and unspecified oracle set (e.g., when the oracle set is yet to be specified or changed) 


oracle Turing program a program in the Turing machine’s control unit that deter- 
mines the next move of the machine, based on the machine’s current state, the last 
symbol read from the tape, and the oracle’s answer to the program’s current question 
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O-semi-decidable set a set for which there is an O-TM that can determine, for any 
element in the set, that the element is in the set; if in truth the element is not in the 
set, the machine determines that the element is not in the set—or never halts 


O-TM an oracle Turing machine with oracle set O 


O-undecidable set a set for which there is no O-TM capable of deciding, for every 
element, whether or not the element is in the set 


P 


padding lemma the lemma stating that a p.c. function has countably infinitely 
many indexes; and if one of them is given, then countably infinitely many others 
can be generated 


pairing function a computable bijective function f : N? > N whose inverse func- 
tions g and h, defined by f(g(n),h(n)) =n, are computable 


paradox an unacceptable conclusion derived by apparently acceptable reasoning 
from apparently acceptable premises 


p.c. function see partial computable function 


parameter theorem a theorem stating that the variables with fixed values (parame- 
ters) of a multi-variable p.c. function @ can always be incorporated into ~’s program 
to obtain the program for the induced function g’ on the rest of the variables 


partial function a function g: AB whose value (x) may be undefined for x € A 


partial computable (or p.c.) function a partial function for which there is a Turing 
machine that can compute the function’s values anywhere the function is defined 


partial O-computable (or O-p.c.) function a partial function for which there is an 
O-TM that can compute the function’s values anywhere the function is defined 


IT,-complete set a IT,-set such that every IT,-set is m-reducible to it 


IT,-set a set of exactly those natural numbers x for which the predicate of the form 
Vy dy2Vy3... OvnR(x,y1,2,---;Yn) is true, where Q is V (A) if n is odd (even), and 
R is a decidable arithmetical relation 


Platonism (or Platonic view) the view that there are abstract mathematical objects 
(e.g., numbers, sets) that exist independently of us (our thought, language, practices); 
our statements about such objects are made true or false by the objects themselves; 
therefore, mathematical truths are discovered, not invented 


positive instance (of a decision problem) an instance of the decision problem 
whose answer is YES 


Post-computable function a function for which there exists a Post machine that 
can compute the function’s values anywhere the function is defined 
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Post’s correspondence problem the undecidable decision problem asking, given 
two finite lists w},...,u, and v1,...,V, of words over an alphabet 2, whether or not 
there is a sequence i;,...,ix of indexes such that uj, ...Uj, = Vi, --- Viz 


Post machine a model of computation that has a control unit; a queue (for symbols) 
connected to the control unit; a read-only tape with cells containing symbols, a 
window movable to the right and connected to the control unit; and a Post program 


Post program a directed graph in the control unit of the Post machine that directs 
the operation of the machine, i.e., each arc of the graph can trigger the instruction in 
the destination vertex, and instructions change the contents of the machine’s queue 


Post’s problem the question whether there is a c.e.degree strictly between the 
Turing degrees 0 and 0’, i.e., whether there is a decision problem whose difficulty is 
strictly between that of the decidable and the complete undecidable problems 


Post’s program (for solving Post’s problem) an attempt to solve the problem by 
using a structural property (namely sparseness of the complement) of c.e. sets that 
would guarantee the existence, undecidability, and incompleteness of such c.e. sets 


Post’s theorem the theorem stating that if a set and its complement are both semi- 
decidable then they are both decidable 


Post’s thesis the assertion that the intuitive notion of what (set) is generable 
is adequately formalized by the notion of generable by a Post normal system 
(or, equivalently, by any model of generation equivalent to the normal system) 


prenex normal form (of a formula in first-order logic) a logically equivalent for- 
mula consisting of a string of quantifiers followed by a quantifier-free formula 


primitive recursion a construction of a function f from two given functions and f 
itself in a restricted way; e.g., f(n,0) = g(n) and f(n,m+1) =h(n,m, f(n,m)),m>0 


primitive recursive function a function that can be constructed from the zero 
function €(n) = 0, the successor function o(n) = n+ 1, and the projection func- 
: def : : age : “a: : 
tions 2 (x1,...,x,) = xi, using only function composition and primitive recursion 


priority method a method for constructing a set that satisfies all requirements in 
an infinite list of requirements; the method organizes actions so that although previ- 
ously satisfied requirements can be injured, in the limit, all requirements are satisfied 


production (in a grammar) a transformation rule of the form a — B describing the 
conditions under which, and the manner in which, subwords of a word can be used 
to transform the word into a new word 


projection function (k-place) the function ak (x1, .--,X%) =2;, which maps k-tuples 
to their ith components; informally, ak extracts and returns the ith argument 


proper axiom an axiom that condenses some fact about specific basic notion(s) 
typical of the current field of interest 
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proper class a class that is not a member of any class (and is, therefore, not a set; 
otherwise it would be a member of its power set, which is a class by definition) 


proper function (of a Turing machine 7) a partial function induced by T as follows: 
The function maps a word u to a word v iff T halts on u and leaves only v on the 
tape; otherwise the function is undefined for u 


proper set (of a Turing machine T) a set induced by T as follows: A word is in the 
set iff T accepts the word, i.e., after reading it, T eventually halts in a final state 


Q 


quantum algorithm an algorithm that uses some essential feature of quantum 
computation and can run on an abstract computing machine that is an instance of a 
model of quantum computation (e.g., quantum Turing machine, quantum circuit) 


quantum computing the computing that uses computational methods inspired by 
quantum-mechanical principles and phenomena, such as probabilistic universes, 
interference, superposition, and entanglement 


quantum Turing machine a model of quantum computation that is based on the 
(ordinary) Turing machine and captures all of the power of quantum computation; 
any quantum algorithm can be expressed as a particular quantum Turing machine 


R 


random access machine (RAM) a computation model with several arithmetic 
registers and a potentially infinite number of memory registers 


recognizer (of a set) a Turing machine that can determine, for any element in the 
set, that the element is in the set; if in truth the element is not in the set, the machine 
determines that the element is not in the set—or never halts 


recursion (self-reference) the process of defining or expressing something (e.g., a 
function, procedure, language construct, or solution to a problem) in terms of itself 


recursion (or fixed-point) theorem informally, the theorem stating that if a trans- 
formation modifies every Turing program, then some Turing program is transformed 
into an equivalent Turing program, i.e., a program with the same global behavior 


recursive function see (general) recursive function 


reduction (in A-calculus) the operation that transforms a A-term by applying one 
of its functions to the function’s arguments and replacing the A-term representing 
the function and its arguments with the A -term representing the value of the function 


reduction (of a computational problem) a strategy for solving a problem that trans- 
forms its input into the input for another problem, solves that problem on the trans- 
formed input, and transforms the solution into the solution to the original problem 
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relativistic machine a hypercomputer consisting of a universal Turing machine 
(stationed on Earth) and a Turing machine (approaching a black hole) that uses 
gravitational time dilation to compute the values of an incomputable function 


Rice’s theorem (for p.c. functions) the theorem stating that intrinsic properties of 
p.c. functions are decidable iff they are trivial 


Rice’s theorem (for c.e. sets) the theorem stating that intrinsic properties of c.e. sets 
are decidable iff they are trivial 


Russell’s paradox the unacceptable conclusion derived in Cantor’s set theory 
stating that there exists a set that both is and is not a member of itself 


S 


search problem a computational problem whose each instance asks for the mem- 
bers of a given set that have a given property 


second incompleteness theorem (Gédel’s) the theorem stating that if the Formal 
Arithmetic A is consistent, then this cannot be proved with means available in A 


semantically complete (theory) a consistent theory F such that a formula F is 
derivable in F iff F represents a Zruth in F, ie., F is valid in every model of F; 
informally, in such an F, Zruth and nothing but the ruth can be derived 


semantic completeness problem the problem of deciding whether or not a given 
theory is semantically complete 


semi-decidable problem a decision problem for which there is a Turing machine 
that can determine, for any positive instance of the problem, that the answer to the 
instance is YES; if the instance is negative, the machine answers NO—or never halts 


semi-decidable set a set for which there is a Turing machine that can determine, 
for any element in the set, that the element is in the set; if in truth the element is not 
in the set, the machine determines that the element is not in the set—or never halts 


semi-Thue system (over an alphabet 2) a finite set of rules (productions) of the 
form x — y (where x and y are words over 2) used for investigating whether and 
how a word over X can be transformed, using only the given rules, into another 
word over 1 


simple set informally, a c.e. set whose complement is “sparse”, in the sense that 
the complement, although infinite, does not contain any infinite c.e. set 


s-m-n theorem see parameter theorem 
2n-complete set a 2,,-set such that every 2,,-set is m-reducible to it 


Xn-set a set of exactly those natural numbers x for which the predicate of the form 
Jy Vy2sy3... OynR(x,y1,2,---;¥n) is true, where Q is 5 (V ) if n is odd (even), and 
R is a decidable arithmetical relation 
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solvable problem a problem, not necessarily a computational one, for which there 
exists a single procedure that can construct a solution to any instance of the problem 


sound theory a theory in which every theorem is valid in every model of the theory; 
informally, a theory in which we cannot deduce something that is not a Truth 


space complexity (of an algorithm) a function whose argument is the size of the 
algorithm’s input and whose value represents the amount of computational space 
needed to execute the algorithm 


start symbol (in a grammar) a designated nonterminal from which every word of 
the language (of the grammar) is generated 


standard model (of a theory) a particular structure (field of interest) in which the 
theory is usually interpreted 


syntactically complete (theory) a theory such that, for any formula F of the theory, 
at least one of the formulas F and —F is derivable in the theory; so in a consistent 
and syntactically complete theory every formula is either provable or refutable 


syntactic completeness problem the problem of deciding whether or not a given 
theory is syntactically complete 


T 
T-complete set see Turing-complete set 
terminal (in a grammar) a symbol that cannot be replaced by other symbols 


theorem (of a theory) a formula that can be derived in the theory; informally, a 
statement in mathematics or logic that can be proved by reasoning 


theory a formal idea or set of ideas that is intended to explain something 
thesis an idea that is expressed as a statement and is discussed in a logical way 
Thue system asemi-Thue system where each rule x — y has the reverse rule y > x 


time complexity (of an algorithm) a function whose argument is the size of the al- 
gorithm’s input and whose value represents the number of operations or the running 
time needed to execute the algorithm 


total function a function g : A > B whose value @(x) is defined for every x € A 


transfinite induction a generalization of mathematical induction stating that in a 
well-ordered set (S,<) a predicate P holds for every element of S if the following 
condition is met: P holds for y € S if P holds for every x € S such that x = y 


transition function (of a Turing machine) a partial function 6 : (g,a) +> (p,b,D) 
specifying the machine’s local behavior: In each step, if the machine reads in state 
q symbol a, it writes b, moves the window one cell in direction D, and enters state p 
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trivial property (of p.c. functions) an intrinsic property of p.c. functions that is 
shared by either every or no p.c. function 


trivial property (of c.e.sets) an intrinsic property of c.e.sets that is shared by 
either every or no c.e. set 


Turing-complete (or 7-complete) set ac.e. set such that every c.e. set is Turing- 
reducible to it 


Turing-computable function a function for which there exists a Turing machine 
that can compute the value of the function anywhere the function is defined 


Turing degree (of a set) an equivalence class consisting of all sets that are Turing- 
equivalent to the set 


Turing equivalence (of sets) a relation between two sets stating that if either of the 
sets were decidable then also the other would be decidable 


Turing jump (operator) an operator that maps sets to sets in such a way that the 
Turing degree of the image set is higher than that of the original set 


Turing jump (of a set) the set resulting from a single application of the Turing 
jump operator to the given set 


Turing jump (of a Turing degree) the Turing degree of the set which is the Turing 
jump of an arbitrary set in the given Turing degree 


Turing machine a model of computation that has a control unit, which is always 
in some state; an infinite tape with cells, which may contain symbols; a movable 
window over the tape, which is connected to the control unit; and a Turing program 


Turing program a partial function in the control unit of a Turing machine, which 
directs the operation of the machine, i.e., determines each next move of the machine 
based on the current state of the control unit and the symbol under the window 


Turing reduction (between problems) a relation <7 on computational problems 
such that P <7 Q iff the existence of an algorithm Ag for Q would imply the exis- 
tence of an algorithm Ap for P (with Ap allowed to make finitely many calls to Ag) 


Turing reduction (between sets) a relation <r on sets such that A <r B iff the 
existence of a decider Dg for B would imply the existence of a decider D4 for A 
(with Dy allowed to make finitely many calls to Dg) 


U 


unbounded minimization a construction of a function f from a given function g 
using the U-operation; e.g., f(n) = wx.g(n,x) defines f(n) to be the smallest m such 
that g(n,m) = 0, if there is one; otherwise f(n) is undefined 
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undecidable problem a decision problem for which there is no Turing machine 
capable of deciding, for every instance of the problem, whether the answer to the 
instance is YES or NO 


undecidable set a set for which there is no Turing machine capable of deciding, 
for every element, whether or not the element is in the set 


universal language the language of the Halting Problem; that is, the set Ko of the 
codes of all positive instances of the Halting Problem, i.e., Ky = {(T,x) |T halts on x} 


universal Turing machine a Turing machine that can simulate execution of any 
other Turing machine on any input 


Vv 


valid formula (under an interpretation) a formula of a theory which, when inter- 
preted in a field of interest, is a true statement about the elements of the field, for 
every assignment of elements of the field to the free variable symbols of the formula 


valid formula (in a theory) a formula of the theory that is valid in each model 
of the theory; informally, such a formula represents a certain mathematical Truth 
expressible in the theory 


WwW 
word (over an alphabet) a finite sequence of symbols from the alphabet 


word problem (for semi-groups) the undecidable decision problem asking, for any 
Thue system and any words u and v, whether or not u can be transformed into v in 
the system 


word problem (for groups) the undecidable problem asking, for any Thue system 
such that every symbol a has an annihilating symbol b (i.e., ba — € and € > ba are 
rules) and any words u and v, whether or not u can be transformed into v 


Y 


Yes/No problem see decision problem 
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